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AN EXPERIMENTAL STUDY TO MEASURE THE CONTRIBUTION OF 
MOTION PICTURES AND SLIDE-FILMS TO LEARNING CERTAIN 
UNITS IN THE COURSE INTRODUCTION TO NURSING ARTS 


LORETTA E. HEIDGERKEN. R.N. 
Catholic University of America 


Section I 


BASIC CONCEPTS UNDERLYING STUDY 


A. Introduction 


The growth of hospitals and public health 
nursing agencies, the public’s demand for low- 
er costs and wider distribution of nursing care, 
the new developments in the fields of nursing 
education as the profession has adjusted to sci- 
entific advances and social needs, the placing 
of nursing on the college and university campus 
-- all of these have served to create a greater 
demand for many more and better prepared 
nurses, so that a wider distribution of an im- 
proved nursing service may be given. (2, 79, 81) 


So that a nurse may meet the ever-increasing 
responsibilities and functions of modern nurs- 
ing service, it is necessary to have a very cro- 
wded curriculum; it must include the biological, 
physical, social and psychological sciences, 
and the nursing sciences as well as an oppor- 
tunity to learn the art of nursing. (12) Thus be- 
cause both more and better prepared nurses 
are needed, it becomes imperative to use the 
most effective teaching methods and materials 
possible to attain the objectives desired. Motion 
pictures and slidefilms are modern teaching 
aids which have been widely and effectively pub- 
licized in general education, and used in busi- 
ness and industry and more recently in the 
military service. Their place in and their con- 
tribution to nursing education have not been 
studied. There seemed to be need for such an 
investigation. 


B. Basic Curriculum in Nursing 


The primary purpose of the Curriculum in 
schools of nursing is to provide nurses who 
have a broad yet scientific basis for nursing 
practice and an adequate social concept of health. 
To do this requires not only an opportunity for 
the development of technical skills but also an 





opportunity to acquire a broad understanding of 
human behavior. Therefore, the curriculum for 
the study of nursing must provide a broad back- 
ground of the social sciences as well as the 
biological, chemical, and physical sciences. 

An understanding of human behavior is the 
primary consideration upon which all accept- 
able nursing depends. 


The curriculum for the study of nursing may 
be divided into three major areas: the science, 
the art and the ideals, attitudes and apprecia- 
tions of nursing. 


The science of nursing includes knowledge of 
the scientific principles underlying the various 
nursing activities. These principles are drawn 
from anatomy, physiology, chemistry, sociology 
and psychology. More important than a know- 
ledge of the principles is the ability to apply 
these principles in the performance of nursing 
activities, for a knowledge of principles is val- 
ueless to the nurse unless it can be translated 
into action by application in the nursing care of 
patients. 


The art of nursing consists of the ability to 
carry out skills essential in giving total nurs- 
ing care. It includes social, managerial, mani- 
pulative, manual, and intellectual skills. The 
art of nursing is more than the ability to carry 
out the techniques of nursing. Too frequently 
a perfection of techniques is made the end of 
education of the nurse instead of the means to 
an end -- the ability to give good nursing care. 
Art implies skill, and skill inplies an under- 
standing of the facts, the principles, the rela- 
tionships, and the interrelationships. It implies 
the ability to pass the right judgment in a crit- 
ical situation which may involve the life of a 
patient. 
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The three areas of the curriuclum must be 
so organized that the learning experiences nec- 
essary to acquire knowledge and skills essent- 
ial to carry out the responsibilities and func- 
tions of the modern professional nurse are pro- 
vided. Learning experiences should be provid- 
ed in the form of problems taken from nursing 
situations. The nursing situations should be 
centered upon the patient and not consist of a 
series of isolated activities. The planning and 
carrying out of the nursing activities in giving 
complete nursing care of the patient are the 
problems which must be solved and analyzed 
and therefore become the abilities to be devel- 
oped in the student. The subject matter and 
the techniques serve as sources for the solving 
of the problems. In this way the learning sit- 
uation is patient-centered not disease- centered 
er system-centered or technique-centered and 
the student will think of her patient as a person 
with a certain type of behavior and not simply 
a case number or a treatment number. Thus 
the theory (subject-matter) and the practice 
(art) of the curriculum are united so that the 
student of nursing will look upon nursing not 
as two separate entities but rather as two as- 
pects -- science and art -- of one thing: nurs- 
ing. 


Since the objectives of nursing education in- 
clude the development of an art as well as sec- 
uring knowledge, a large part of the curriculum 
must be devoted to practice. For the develop- 
ment of nursing skills requires repeated pur- 
poseful practice in a variety of life situations 
under competent guidance. Practice should be 
considered a proper condition of learning and 
not merely repetitive practice. The learning 
situations selected for practice should provide 
meaningful experiences, for it is common know- 
ledge that the transfer of learning from one sit- 
uation to another is roughly proportional to the 
degree to which the situations are similiar in 
structure and meaning. (25, p. 275) They must 
provide opportunity for mastery of nursing tech- 
niques, and mastery of technique is related to 
the student’s understanding of all the underlying 
principles. For the better she understands the 
underlying factors and principles, the more 
meaningful will be the learning situation; thus 
the transfer of learning to other situations is 
facilitated. 


It is of course impossible to provide for pract- 
ice all the types of learning situations which 
the student may later encounter in her nursing 
experience. However, if she understands the 
general underlying principles of the nursing act- 
ivity as well as the general nursing prinicples 
of comfort, safety, therapeutic effectiveness 
and economy of time and resources she will be 
able to transfer learning from one nursing sit- 
uation to another. She will be able to carry out 
the particular nursing activity in many varied 
situations. 





AUDIO-VISUAL MATERIALS AND LEARNING 


Basic to any problem in learning and teaching 
is a consideration of the educational processes 
as well as a consideration of the curriculum 
content. Learning is a change in the organiza- 
tion of behavior which gives an individual more 
effective control over the conditions of experi- 
ence. (25, p. 335) Learning proceeds by dif- 
ferentiation, integration, and reorganization of 
experiences - all interpreted in terms of the 

past experiences of learning of the individual. 
(25, 84) 


If learning is a process in which experience 
plays such an important role, it is evident that 
past experience becomes an important condition 
of the learning activity and poverty of ideas re- 
sults from a meagerness of contact with the 
world of things and processes. Since audio- 
visual materials can provide experiences of 
varying degrees of concreteness they become 
an important factor in the curriculum. Audio- 
visual materials are all those materials, such 
as motion pictures, slidefilms, slide, objects, 
models, specimens, graphs, radio and others 
that may be used to provide sensory experiences 
to the learner. 


Much of our learning is by means of sense 
experience -- sense experience with reality or 
with representations of reality. Olsen (54) de- 
scribes learning experiences as occurring on 
three levels. Proceeding from the most con- 
crete to the abstract, he first describes direct 
first-hand experience with reality itself; he then 
proceeds to the next most concrete, vicarious 
experiences of reality in which audio-visual ma- 
terials are mechanical representations of real- 
ity; and finally he moves to the least concrete, 
the abstract symbols of reality, language or 
word symbols. Combining these three levels, 
we find that properly organized words supple- 
mented by pictures, graphs, charts, and sound 
effects can with reasonable fidelity convey to 
the student ideas that have not been experienced 
directly. Therefore, audio-visual materials 
provide concrete experiences which should help 
to clarify concrete concepts, thereby making 
learning experiences more meaningful and com- 
batting verbalizaticn. The degree of concrete- 
ness varies with the ideas portrayed, the type 
of medium used, and skill exhibited by the pro- 
ducer of the audio-visual materials in the 
handling of the medium. 


Much talking and writing but very little ex- 
perimentation has been done, as yet, about the 
whole field of methodology as it relates to the 
‘best use of the various types of audio-visual 
materials and about the various types of learn- 
ing situations in which each is best used. How- 
ever, experts in the field of audio-visual educa- 
tion agree about the general principles of utili- 
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zation of these materials. The following state- 
ment by Hoban agrees with the known principles 
of learning and might well be used to sum up 
the best accepted practice regarding the utili- 
zation of audio-visual materials. He says, ‘‘To 
be successfully used, films should have a defin- 
ite and functional relation to other materials 
and activities within the unit. They should nei- 
ther be isolated within this series of experienc- 
es nor tacked on as an appendage’’. (36, p. 111) 
This statement applies equally well to all types 
of audio-visual materials. They must be used 
as integral parts of the unit; that is, they must 
be selected in terms of what will help best to 
attain the objectives that have been set up; se- 
lected and adapted to the ability and experience 
level of the students; and used in terms of how 
they will best help attain those objectives. Thus 
the objectives chosen will help determine the 
purpose, how many and in what manner these 
materials will be used. The criteria of utili- 
zation will not be ‘‘number of pictures shown’’, 
or ‘‘how many times each picture is shown,’’ 

or even what periods in the unit are used, but 
will be the purpose for which they are used, 

how they will be used, and who will use them. 


With these concepts as the framework, the 
problem of this investigation was approached. 
Having always in view the need for educating 
nurses who in ever increasing numbers will 
have a broad, yet sound scientific basis for 
nursing practice, with an adequate social con- 
cept of health, the most effective teaching and 
learning materials were sought. Since motion 
pictures and slidefilms are among the newer 
teaching materials which might effectively 
help to attain such objectives, they were chos- 
en for this investigation. One of the most 
difficult problems of teaching nursing is to 
help students transfer their scientific learning 
to nursing arts in the form of the scientific 
principles which underly the nursing activities. 
Will motion pictures and slidefilms help in ac- 
complishing these objectives? 


D. Analysis of the Problem 
I. Purpose 


The primary purpose of the experiment was 
to determine the contributions of motion pic- 
tures and slidefilms to the learning of certain 
nursing activities taught in the course Intro- 
cution to Nursing Arts when these devices were 
used in the typical teaching situation in schools 
of nursing offering a three year curriculum in 
nursing. The nursing activities included were 
those taught in the Unit on Cardinal Signs and 
Symptoms - Temperature, Pulse, Respiration 
and Blood Pressure - and the Unit on Thera- 
peutic Uses of Heat and Cold. 


The types of learning measured included vo- 
cabulary, facts and principles, application of 
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principles to nursing activities, and attitudes 
and appreciations. Emphasis was placed on 
an understanding of the scientific principles 
underlying the nursing activities studied. 


The purpose of the experiment included mea- 
surement of the effectiveness of motion pictures 
used alone, slidefilms used alone, and of mo- 
tion pictures and slidefilms used in combination 
in the teaching of Nursing Arts. The order in 
which the motion pictures and slidefilms were 
presented was also measured for effectiveness. 


A secondary purpose of the study was to de- 
termine certain other results such as (1) the 
students’ attitudes toward the use of motion 
pictures and slidefilms in the course Introduc- 
tion to Nursing Arts; (2) the students’ opinion 
of the comparative effectiveness of motion pic- 
tures, slidefilms, and lecture-demonstration; 
(3) the students’ opinion of the most effective 
order of presentation of motion pictures and 
slidefilms; and (4) the students’ opinion of the 
fuctions in general of audio-visual materials. 


2. Hypotheses 


In order to provide a sound basis for infer- 
ence, the problem expressed in these purposes 
were extracted and hypotheses were formulated 
in exact terms. These hypotheses were stated 
in the form of null hypotheses because the null 
hypothesis is so stated that the data are given 
a chance to disprove the hypothesis. As in all 
experimental work, this means that one single 
crucial experiment with adverse results could 
disprove a million favorable experiments. An 
example of the null hypotheses set up for the 
experiment is as follows: 


Motion Pictures, slidefilms, and a combina- 
tion of motion pictures and slidefilms or nei- 
ther motion pictures or slidefilms are equally 
effective for learning when used in the teaching 
of Unit One, The Cardinal Signs and Symptoms 
in the Course Introduction to Nursing Arts. 
Similarly other hypotheses were formulated to 
include all the problems expressed in the pur- 
poses of the experiment. This was necessary 
so that the proper tests of significance could 
be applied to determine whether there was a 
significant difference in achievement in the 
use of any of the methods used in this experi- 
ment on the preclinical student nurses from 
twelve schools of nursing in Indiana. That is to 
say, if any variations were found, could these 
differences be validly attributed to fluctuations 
in random sampling or to the methods used in 
the experiment, and if so, with what degree of 
confidence ? 


These null hypotheses form the basis for the 
experiment and were tested by the application 
of the F-test of significance. The F-test is 
based on the assumption of random sampling 
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and therefore randomization was employed 

for the assignment of students to sections and 
assignment of methods and their order of pre- 
sentation. Applying a test of significance (the 
F-test in this case) is actually a process of 
comparing observed values (data obtained) 

with the expected values on the basis of random 
sampling. The expected values or the constants 
that express the distribution curve used for 
comparison are sample values and not popula- 
tion parameters which are usually unknown. 
Thus rigorous inference of the data may be ob- 
tained because the distribution is so expressed 
that the sample value gives the distribution; 
that is, the constants are determined from the 
sample itself. 


This theory of interpreting data with tests of 
significance based on the values of the sample 
itself is known as the Small Sample Theory. 
This theory has particular application to pro- 
blems of research in educational psychology be- 
casue of the limitations necessarily imposed by 
the very nature of the populations used. The 
size of the sample is generally dependent upon 
the number of intact groups or small samples 
which constitute the total sample instead of 
upon individual observations. Thus the unit 
of sampling is most often the class, the school, 
the community rather than individual students. 
(48, p. 24) Because this fact has been ignored 
in the past, many of the research studies done 
in educational psychology have been invalid or 
misleading. 


3. Experimental Design 


A partially confounded factorial experiment 
was used in planning the design of the present 
investigation. This design was used because 
it enabled the experimenter to test all the hy- 


potheses in question and on the basis of the re- 
sults obtained either reject or accept the hypo- 
theses. 


In a factorial experiment the design is so plan- 
ned that a variety of conditions can be investi- 
gated at the same time because they are arrang- 
ed in such a manner that they are independent 
of each other; thus sums of squares can be de- 
termined for each condition. The treatment of 
the sums of squares is the basic statistical ana- 
lysis involved in the factorial analysis and is 
better known as Analysis of Variance. The 
factors which were investigated in this experi- 
ment were these audio-visual materials: motion 
pictures and slideflims and their order of pre- 
sentation in their use. 


The single factor experiment has bedn most 
widely used in problems of education and psy- 
chology. There are many investigators who 
still consider the study of a single factor at a 
time as an immutable principle of research in 
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educational experiments. It has been and still 
is thought by many that only in the single fac- 
tor experiment can the most rigorous available 
criterion for ideal experimentation be obtained, 
Further, it has been supposed that by the use 

of such designs the greatest objectivity and ex- 
actitude and more freedom from ambiguity can 
be obtained. (14, p. 340) However, such narrow 
restrictions not only reduce the amount of in- 
formation obtained but reduce the efficiency of 
the entire experiment because a multi-factor 
experiment is less artificial and therefore pro- 
vides a more valid measure of the effects of 
each factor than does the single-variable method. 


In recent years, after Fisher (20) and his 
students worked out an efficient experimental 
design and statistical treatment - analysis of 
variance - which can be used for extensive 
treatment of practically all data, experimenters 
in other fields (such as biology, genetics, and 
agriculture) have utilized his design and statis- 
tical techniques and devloped further techniques 
of experimentation. Goulden, (28) Lindquist, 
(48) Snedecor, (63) and others have discussed 
and illustrated different aspects of the use of 
analysis of variance in different experimental 
designs. In view of the wide acceptance of this 
experimental design and analysis in these fields 
it is surprising that it has not been more widely 
used in educational experimentation. 


Through the use of the mathematical device 
of confounding it is possible to omit some of 
the less important replications-- to increase 
the accuracy of the more important comparisons 
at the expense of the comparisons of lesser im- 
portance. With partial confounding the effect 
of one factor is confounded or confused with the 
locks or individual difference so that informa- 
tion may be obtained concerning all treatment 
effects, although some effects will be based on 
fewer replications. In this experiment the var- 
iables are different methods, teacher differenc- 
es, and levels of ability of students. The levels 
of ability of the students can be determined, 
but the variations between the methods and the 
teacher cannot be separated independently; 
therefore, the teacher and method are said to 
be confounded. Also, since there was not a suf- 
ficient number of groups to permit replication 
of all the factors mentioned above some of the 
factors were confounded in some of the blocks. 


4. Analysis of Variance 


The data obtained from a factorial experi- 
ment must be statistically analyzed in such a 
manner that group differences are taken into 
account and a valid estimate of error is obtain- 
ed. This type of analysis is known as Analysis 
of Variance and was used in this experiment. 
The basic principle underlying the analysis of 
variance technique is that the variance of a 





ON 





JOURNAL OF EXPERIMENTAL EDUCATION 265 


large sample consisting of a number of small 
groups may be analyzed into various compon- 
ents which are expressed in terms of independ- 
ent degrees of freedom and sums of squares. 

It is a method which permits the analysis of re- 
sults from a series of parallel or duplicated 
experiments, each of which is performed un- 
der homogenous conditions. (48, p. 91) 


The techniques of inference or the statistical 
analysis used in this experiment, were those 
which were originally developed by Yates (76) 
and modified by Rider (57). Lindquist (48) de- 
scribes procedures for various factorial de- 
signs applied to education, but does not include 
any design with confounding. In order to 
meet the conditions of this experiment, the tech- 
nique of inference described by Yates and Rider 
was modified by Fattu (19) and followed in 
this experiment. 


To summarize, a partially confounded ex- 
periment was used in the present experiment 
to study in a single operation a number of 
samples in which the groups using the motion 
pictures and slidefilms were drawn from twe- 
lve different populations (twelve schools of n 
nursing) in order to determine if there were 
any differences in achievement when they were 
used in various combinations in the typical 
teaching situation in school nursing, and if 
there were differences, whether these differ- 
ences could be chance have come from the 
same total population - preclinical student nur- 
ses. 


E. Significance of Study 


Scientific research into and investigation of 
the contribution of motion pictures and slide- 
films to the field of nursing education has al- 
most never been performed. Only one study 
has been found in this field, and in it the mo- 
tion picture was used as one of the means to 
effect change in attitudes toward personal char- 
acteristics considered essential for success in 
nursing. (62) 


Many investigations have been made of the 
use of motion pictures in elementary and sec- 
ondary schools, but only a few studies have been 
made with students beyond the high school level. 
When one compares the use of motion pictures 
as used in various levels of education, two con- 
ditions have to be considered: first, the broad 
general areas found in elementary education; 
and second, the subject-matter concentration 
found in secondary and elementary education. 
In nursing education both conditions have to be 
considered. 


Motion pictures have long been used in med- 
icine and related health subjects (9, 32, 45). 
There is, however very little research evi- 





dence about the effectiveness of their use in 
these fields, and therefore much of their use 
is based on unverified opinion and faith. 


Very little controlled research has been done 
on the use of motion pictures and slidefilms 
in higher education. A number of articles 
have been written about their use, but these 
articles are mostly based on opinion and not 
on carefully controlled experimentation. (1, 9, 
52) 


A careful search of the previous investiga- 
tions reveals practically no research evidence 
suggesting accepted standards for criteria per- 
ta to the most effective visual media or 
methods of utilizing them. It is true that much 
of the research in visual education has demon- 
strated that visual aids do contribute to learn- 
ing; however, many of these studies have been 
concerned with superficial problems. Dale and 
Hoban (17) in their review of literature point 
out the inadequacy of the evidence of effective- 
ness of visual aids. A plea for systematically 
planned and better controlled experimentation 
is made by Carpenter in the following statement: 


‘*,.. If this promising educational tool - the 
motion picture - is to be effectively employ- 
ed, it must be known what it can and cannot 
do, what its strengths and weaknesses are, 
and what principles should be employed in 
the tool’s construction to make it of maxi- 
mum effectiveness for communicating mean- 


ings and changing behavior. (10, p. 120) 


That this information is lacking is soon ap- 
parent when the research studies in audio- 
visual education are reviewed. The suitability 
of motion pictures and silidefilms as instruction- 
al media for the training and education of part- 
icular groups is almost completely lacking. (10) 
This is the hard, factual evidence. The reaction 
of students to motion pictures and slidefilms is 
still another area which has been relatively un- 
touched. Hoban calls attention to the fact that 
there is almost complete absence of research 
data on the reactions of different goups to the 
various educational tools. (37) Most research 
studies have not attempted to go beyond the 
question of general values of visual education. 








Finally a review of the experimental studies 
made with motion pictures and slidefilms dis- 
closed incomplete or inadequately designed ex- 
periments and inappropriate tests of signifi- 
cance in many of the studies. The factorially 
designed experiment with analysis of it increa- 
ses the yleld of information and provides a 
valid test of significance when the underlying 
assumptions have been satisfied. It permits 
the investigation of a number of factors with 
greater flexibility and accuracy in interpreting 
results than the single-factor design which has 














266 JOURNAL OF EXPERIMENTAL EDUCATION 


been so widely used in research in educational 
psychology. Crutchfield and Tolman (15) point 
out that, not only is there greater econoly in 
factorial design, but such experimental set-up 
is less artifical and hence provides a more 
valid measure of the effects of each factor than 
does the single-variable method. 


Section IT 
REVIEW OF LITERATURE 


A review of research studies in the field of 
visual education shows that probably the first 
significant educational interest was expressed 
by Averill in 1915 in an article about the ed- 
ucational possibilities of motion pictures. (4) 
This was not a discussion of an experiment, 
but was a suggestive article about further work 
which should be done. Intensive experimental 
investigation of visual education began shortly 
after this. One of the first experiments report- 
ed was one made by David Sumstine, a descrip- 
tion of it was published in School and Society in 
1918. (67) Although this study was very limited 
in the light of present knowledge of experimental 
methods, it was a beginning of experimental 
investigation in the field. 





Since the publication of Sumstine’s (67) report, 
many studies have been made of audio-visual 
education. A number of these studies have been | 
very extensive, such as Weber’s (71), Freeman’ 
(22), Rulon’s (61), while other studies, such as 
Doscher’s (18), and Jayne’s (38), have been 
very limited. Some of these investigations are 
open to question because of their very extent or 
limitedness. Sturmthal (66) made an exhaustive 
study of the techniques used in research in vis- 
ual education and reported that in many cases 
analysis was not sufficiently intensive to reveal 
essential factors operative in producing results 
observed. Often times atypical classroom situ- 
ations were set up for the studies; for example, 
in one study (71) one group of fourth and fifth 
grade children were shown a film and another 
group were given a lecture; in another study 
(22) the experimental group was shown an 
eight minute film, the control group was given 
an eight minute discussion, and then tests were 
applied to both groups. The fact that the lecture 
method is not used for children of that grade 
level, or that the use of such limited time pre- 
sented atypical conditions was not considered. 
Still another limitation of the early research 
studies was the type of motion picture used. 
(22, 71) Many such studies reported that they 
had used the only available films and that these 
were not designed for educational purposes but 
were made for advertising. Thus the results 
obtained were reported as if for educational 
films when actually the films used were not de- 





signed for that purpose. 


The research done in the past ten years has 
been considerably improved. However, as late 
as 1945, Stenius in his report on the literature 
in the field of visual education has this to say: 


Research in the field has reflected the 
status of the program in the schools. There 
has been no continuous pattern of investiga- 
tion. For the most part, studies have dealt 
with specific problems in a rather super- 
ficial manner. It is not difficult to match 
every investigation that has been proved 
that the use of visual-auditory aids resulted 
in increased instructional effectiveness with 
one that has shown no added benefit to pupils. 
(64, p. 244) 


This statement of Stenius (64) is a confirma- 
tion of the indictment made against research 
in audio-visual education by other writers, 
such as Long (49), Kinder (42), Hoban (35), and 
others. 


Most of the studies have dealt only with fac- 
tual materials which duplicated the same 
areas, and often resulted in nothing more than 
an extension of data on problems already in- 
vestigated, therefore making very little mean- 
ingful contribution to the research of the field. 
(35) The reported percentage of increase in 
gaining factual knowledge varied considerably 
in these studies. 


The technique of inference used was inadequate 
in many of the studies reviewed. For example, 
no tests of significance were employed in the 
first studies reviewed and only the probable 
error of the mean and the standard deviation in 
the next three studies. All the other studies re- 
ported, with the exception of the study by Rulon 
(61) and the University of Minnesota studies 
(55, 72), used the Critical Ratio as the test of 
significance. Since the critical ratio is a 
statistical technique based on a norma! distri- 
bution, its use in these experiments goes be- 
yond the statistics in the data of many of the 
samples used, thus employing the wrong infer- 
ential base and resulting in conclusions which 
cannot necessarily be considered valid. 


There have been only a few studies concern- 
ed with non-factual objectives, such as desir- 
able social attitudes and awareness of prob- 
lems. Two such notable studies are those of 
Charters (11) and Ramseyer (56). Ramseyer 
investigated the effect of the documentary 
film on social attitudes. He used students from 
the sixth grade through college and non-college 
adults, and showed changed attitudes produced 
by motion pictures. In the Payne Fund Studies, 
Charters (11) showed that theatrical motion 
pictures do effect social attitudes of children. 
However, little has been done by way of con- 
trolled experimentation to show the attitudes of 
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students to the use of motion pictures and other 
audio-visual aids under normal classroom 
conditions. 


The unique function inherent in the motion 
picture itself as used in education is another 
area which has received very little attention. 
Only one such study was found in the literature. 
Keeslar analyzed twenty-four of the better sci- 
ence films very critically and found that 44.1 
per cent of the scenes in these films served no 
unique or specialized function and made no con- 
tribution to the major objectives of science ed- 
ucation. (40, p. 228) Actually, these scenes 
could have been presented on-slides as effec- 
tively and much less expensively. 


The armed services used audio-visual mater- 
ials very widely as training aids in preparation 
for war. However, in spite of the extended used 
of these aids, very little research was done to 
evaluate and support the extensive use made of 
them. (53) Those studies which were made and 
reported can be grouped under the following 
headings: utilization techniques of motion pict- 
ures and filmstrips, special techniques for us- 
ing training aids and special devices and opin- 
ions of trainees and instructors. 


One of the studies concerned with the utiliza- 
tion techniques of motion pictures and filmstrips 
reported by Hoban revealed as much as 19 per 
cent increase of informational materials learn- 
ed as a result of the proper utilization of mo- 
tion pictures and filmstrips. (37, p. 13) The 
instruction in this group was comparable to 
that of the high school school level. The sub- 
ject was elementary map reading. Other studies 
were concerned with the attitudes of instructors 
and trainees who used these training aids. Re- 
sults reported showed a large percentage of 
the men were in favor of the use of training aids: 
(53, p. 63) 


In spite of the weaknesses and inadequacy of 
many of the investigations, there is a high con- 
sistency of data among the studies to indicate 
that the use of motion pictures and other visual 
aids is superior to the use of verbal material 
alone or the unorganized use of other visual 
aids. 


In Table I is presented a brief summary of the 
most important investigations reported about 
the field of visual education as well as reviews 
of those studies having specific reference to 
the problem of the present investigation. The 
summary includes the type of audio-visual aids, 
grade level, number of students, subject, proc- 
edure, findings of the investigator and a crit- 
ical evaluation of each experiment. 
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Section ITI 
DESCRIPTION OF THE EXPERIMENT 


The present study of an experiment carried 
out in twelve schools of nursing in the state of 
Indiana was made to determine the contribution 

of motion pictures and slidefilms to the differ- 
ent kinds of learning which underly certain nurs- 
ing activities taught in the course Introduction 
to Nursing Arts to students of nursing in the 
typical teaching situation in schools of nursing 
offering the three year basic curriculum in nurs- 
ing. The kinds of learning to be measured in- 
cluded facts and principles, skills, vocabulary 
understanding, appreciations and attitudes, and 
application of principles to life situations. Two 
units of instruction were used: (1) the unit on 
the Cardinal Signs and Symptoms - Tempera- 
ture, Pulse, Respiration and Blood Pressure; 
es (2) the unit on Therapeutic Uses of Heat and 
old. 


The description of this experiment includes a 
detailed account of the selection of the schools 
of nursing, of the units of instruction, and of 
the motion pictures and slidefilms used. Since 
the units of instruction as well as the measuring 
instruments had to be developed for this experi- 
ment, their development - including the valida- 
tion procedures and reliability measures em- 
ployed - are described. The design of the ex- 
periment as well as the basis upon which it is 
planned and the techniques of inference em- 
ployed are also explained. Finally, the details 
of the administration of the experiment are 
outlined. 


A. Selection of Schools-of Nursing 


The experiment on motion pictures and slide- 
films was conducted in twelve schools of nurs- 
ing in the state of Indiana which will be referred 
to as classes 1 to 12. A total of 405 pre-clin- 
ical student nurses participated in the study. 
The number of sections into which the students 
were already divided was retained, but the stu- 
dents were reassigned to sections randomly, 
using Fisher’s Table of Random Numbers. 
There were two sections each in eight schools, 
three sections each in three schools, and four 
sections in one school, making a total of 29 sec- 
tions in all. 


The twelve schools of nursing chosen for the 
study represent all of the schools of nursing in 
the state of Indiana (except those in the South 
Bend-Fort Wayne area) which had a large eno- 
ugh pre-clinical class of students to make it 
possible for them to participate. The class had 
to be large enough to permit sectioning since at 
least two sections were wanted in each school 
and too small a number of students would have 


required unnecessary duplication of teacher 
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time tf sectioned. 


tion to Nursing Arts in each school served as 
teachers in the experiment. All the teachers 
who participated in the study were carefully in- 
terviewed before the study, since only teachers 
who were interested in full cooperation were 
wanted. All of the administrators and teachers 
of the schools of nursing participating in the 
experiment gave their full cooperation and were 
most conscientious in carrying out in detail all 
directions and plans for the study. 


ot Nering Ave neh school served aa th 


In general the program of studies is the same 
for all schools of nursing within a state in the 
first nine months of the basic curriculum of 
nursing. This is true because the program of 
studies for the schools of nursing is outlined by 
the State Board of Nurse Examiners. However, 
the sequence of courses and the sequence of the 
content of the courses is not fixed; therefore, 
considerable variation is possible. In order to 
make certain that the content of the two courses 
of study (Anatomy and Physiology and Introduc- 
tion to Nursing Arts) with which this study was 
concerned, was the same in all schools of nurs- 
ing a conference was held with the teachers of 
both courses at the beginning of the semester. 
The instructor of the Anatomy and Physiology 
course was requested to cover units on skin, 
circulatory system, and respiratory system as 
early as possible since these systems were the 
ones concerned in the units to be taught in the 
Nursing Arts course for the experiment. 


The program of studies for the pre-clinical 
students in this experiment varied from 23 to 
30 hours of class a week. This included labora- 
tory classes in the Nursing Arts laboratory. 

In addition to this, a laboratory period of super- 
vised practice was carried out in the wards of 
the hospital and ranged from 2 to 9 hours week- 
ly. The average hours devoted to the Nursing 
Arts course ranged between 7 and 10 hours 

per week. Classes were held in the regular 
Nursing Arts laboratory in all the schools. 


B. Selection of Units 


A unit on Cardinal Signs and Symptoms - 
Temperature, Pulse, Respiration, and Blood 
Pressure - and a unit on The Therapeutic Uses 
of Heat and Cold from the course Introduction 
to Nursing Arts were chosen for two reasons. 
The curriculum for the student of nursing in- 
cludes theory and practice. The theory and 
practice of nursing are closely integrated thr- 
oughout the clinical period, making the pro- 
gram of studies rather inflexible. Since this 
study was to encompass students from a number 
of schools of nursing it would be very difficult 





to carry out an experiment during this period 


of the student’s training; therefore it was nec- 
essary to select a subject from the pre-clinica) 
period of the curriculum, since at this time it 
is possible to manipulate the courses of study 
in order to have uniformity between schools. 
The pre-clinical period lasts for the first four 
to six months of the curriculum. These units 
were also chosen because educational motion 
picture films designed for use in this course 
were availabe. 


The course Introduction to Nursing Arts is a 
required course in all curricula for students of 
the basic curriculum in nursing. (77) It is the 
first course in nursing and builds a broad found- 
ation in general nursing principles and pract- 
ices. In this course the student nurse is pre- 
pared for the actual contract with and for 
nursing care of the patient. 


C. Planning the Units 


Before any valid comparison of learning ach- 
ifevement may be made between groups, it was 
necessary that all teachers: include the same 
content in their unit plan. Consequently it was 
necessary to develop the two units to be used 
in the experiment. Sources consulted for selec- 
tion of objectives included, (1) A Currichlum 
Guide for Schools of Nursing (77), published by 
the National League of Nursing Education, and 
(2) all Nursing Arts textbooks containing princi- 
ples of nursing practice published in the last 
ten years. 


A central objective was set up for each unit, 
which was further analyzed into the understand- 
ings - including facts and principles, skills, at- 
titudes and appreciations necessary to attain the 
central objectives. Emphasis was placed on an 
un of the scientific principles under- 

nursing activities studied as well as on 
skill in applying these principles to carrying 
out the activities. The concept of understand- 
ing as interpreted by the Commission on the 
Function of Science in General Education can 
very well be adapted to nursing education. (82) 
The committee sets forth the framework in 
which this information can operate and build 
understandings. In the report the committee 
says: 





' OF. .¥ ’ Pp. . 
Thus, before understanding can take place, it 
is necessary that the facts, generalizations, 
principles, habits, attitudes, and appreciations 
operative in the situation be related to each 
other. A principle, for example, becomes an 
understanding for an individual only when and 
as it opens up new relationships not previously 
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recognized and so brings about a change in be- 
havior. 


The importance of understanding the scientific 
principles underlying nursing activities has 
been explained by many writers of nursing edu- 
cation. Stewart (65) writes that nursing has 
become more than an art, it is an applied sci- 
ence which requires for its safe practice the 
application of scientific principles. Brown (8) 
in discussing professional education for the 
nursing of the future points out that the de- 
mands and responsibilities of modern nursing 
require considerable knowledge of the physical, 
biological, and social sciences, and judgment, 
initiative, and skill in determining facts from 
which conclusions can be deduced. There are 
many scientific principles, very complexly in- 
terwoven, represented in every nursing proced- 
ure and activity. (6, 7) These principles are 
the pertinent facts from the sciences assembled, 
correlated, and applied to nursing activities. 


After the objectives were definitely formu- 
lated and stated, the content and related learn- 
ing and teaching activities needed to attain the 
objectives were selected. The activities se- 
lected were lecture-demonstration by the teach- 
er; practice and return demonstration by the 
student in the laboratory and hospital ward; the 
use of motion pictures, slidefilms and other vis- 
ual aids, such as objects, models and speci- 
mens, needed fér a demonstration. The objec- 
tives, content, and time plan were then com- 
piled into complete units outlined and sent to 
each teacher participating in the experiment. 


In order to validate the units they were sub- 
mitted to seven subject-matter experts, who 
were recognized and experienced teachers of 
Nursing Arts, for their evaluation and comment 
on the proposed units. The units were also 
tried out in a preliminary trial experiment 
which was made in two schools of nursing as 
part of the procedure of validating the tests. 
Suggestions and results obtained from these 
three sources - the subject-matter experts, 
the trial experiment, and the teachers partici- 
pating in the study - formed a basis for re- 
vising the units. Thirteen hours were planned 
for Unit One and fifteen hours for Unit Two. 
This time allotment included one hour each for 
pretest and for final test for each unit. 


Teachers were not required to follow the Unit 
plans in any fixed detail but were asked to be 
sure they included all the objectives and con- 
tent outlines and that the content be broken 
down into two major blocks. This was necess- 
ary so that the motion pictures and slidefilms 
could be used at the same period in the units 
in all the schools. Aside from this, the teacher 
was permitted to follow her own plan since ev- 
ery effort was being made to preserve the nor- 





mal teaching situation in the school. The same 
textbook was used in all the schools. 


D. Selection of Motion Pictures and Slidefilms 


There are relatively few sound motion pict- 
ure films and slidefilms which can be matched 
to the educational ability levels and to the ob- 
jectives for teaching students of nursing. The 
writer felt fortunate in being able to secure 
films which met these conditions. The follow- 
ing sound motion pictures and slidefilms were 
selected for use in the experiment: 


1. The Vital Signs and Their Interrelation. 32 
minutes 

2. The Therapeutic Uses of Heat and Cold, Part 
1: Administering Cold Applications. 20 min- 
utes 

3. The Therapeutic Uses of Heat and Cold, Part 
11: Administering Hot Applications. 19 min- 
utes. 


These films were produced by the Division of 
Visual Aids for War Training, United States Of- 
fice of Education, specifically for use in the 
basic curriculum in nursing. These films were 
planned to illustrate and demonstrate scientific 
principles underlying certain selected nursing 
activities rather than any specific techniques. 


Since the motion pictures, slidefilms, and an 
Instructor’s Manual were planned as a unit cov- 
ering the same content - one a sound motion 
picture and the other as a series of still pic- 
tures - it was possible to have almost identical 
pictorial content in the form of two different 
media available for study in this experiment. 
Most of the pictures used for the slidefilm are 
still pictures lifted out of the films. 


E. Administration of the Experiment 


Preliminary conferences were held with the 
Director of Nurses and instructors participat- 
ing in the study in August and September, 1947 
to make plans for initiating the experiment. 
Further conferences were held later to discuss 
the final plans: the teaching and examination 
procedures to be followed by the teachers. A 
copy of the Instructor’s Manual planned to 
accompany each motion picture and slidefilm, 
together with mimeographed general directions, 
was given to each teacher. The details of util- 
ization, particularly with reference to the pre- 
sentation of the motion pictures and slidefilms 
were discussed in conference. In all cases of 


of these visual materials were integrated into 
the units. Preceding each film presentation, an 
attempt was made to motivate learning and to 
direct attention to important things to look for 
in the pictures. A discussion indicating sim- 
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ilarities and differences to the procedures in 
their own school followed the showing of the 
motion pictures and slidefilms. With one ex- 
ception, the slidefilm in Spe shown at the first 
class as an introduction and in the last class 
as a summary, the motion pictures and slide- 
films were made part of the class work. Mo- 
tion pictures were shown twice in each section 
in which they were used. 


Copies of examinations and answer sheets 
were wrapped, sealed, and mailed to the teacher 
assigned to give the examination. It had been 
requested that someone other than the instruc- 
tor who was to teach the class give the pretests 
so that her teaching would not be conditioned by 
information obtained on the tests. 


Pretests were administered by one of the 
teachers in the school of nursing other than the 
Nursing Arts teacher to the classes in all the 
schools on Monday, November 8, and teaching 
in the units was begun. Each section in each 
class was taught separately but by the same 
teacher. As soon as instruction in Unit One 
was finished, the test which had been used for 
pretesting was given as a final test. Immedi- 
ately after the final test, the pretest for Unit 
Two was given and instruction begun in this 
unit. Since assignment to methods and order 
of presentation was done through random selec- 
tion for each unit, it was necessary to reassign 
combinations to the sections for Unit Two. The 
experiment extended from November 8 until 
December 10. 


F. Testing Program 


The testing program included three types of 
measurement: (1) intelligence test, (2) achieve- 
ment test, and (3) attitude scale. 


The American Council on Education Psycho- 
logical Examination, College Edition, was used 
as one of the means of determining ability 
levels of the students. 


The achievement tests and attitude scale were 
constructed by the writer since there were no 
standardized tests available covering the phases 
of this investigation within the subject-matter 
area of the units used. In the construction of ° 
the tests used, present acceptable theory and 
practice in test construction as described by 
Lindquist (48), Ross (59), and Ruch (60), was 
observed and used to the fullest extent possible. 
The steps used in validation of achievement 
tests were, (1) Analysis of Course of Study, (2) 
Judgment of Subject Matter Experts, (3) Use in 
Preliminary Experiment, and (4) Item Analysis. 


1. Construction of Achievement Tests 


Two achievement tests were constructed, one 





for each unit. The same test was used as a 
pretest and as a final test. As already indica- 
ted» objectives were formulated for the units 
and the achievement desired defined in terms 
of facts and concepts, skills, principles and 
generalizations and appreciations and appli- 
cation to life situations. Tests were designed 
to measure understandings and not mere fac- 
tual learning. Tyler (69) has shown that the 
mere verbal learning of facts and principles 
does not necessarily lead to integration and 
application; and by implication these conclus- 
ions point to the necessity for learning in the 
functional situation whereby principles are 
abstracted from data and applied to data if 
the learning is to be functional. 


The student nurse must have more than a body 
of facts, verbally learned. She is constantly 
confronted with relatively unfamiliar situations 
which she must be able to meet confidently and 
intelligently. This she cannot do if she is 
equipped only with barren facts and formal 
skills. She must have an understanding of the 
principles, the situations, the condition of her 
patients, and an ability to apply this knowledge 
to the care of her patients. Therefore, typical 
nursing situations which the student nurse en- 
counters in the hospital ward were described; 
then a list of nursing activities from which 
one had to be chosen as representing the best 
answer was given. Following this, principles 
that supported various nursing activities were 
stated and the student was asked to select the 
one which supported her choice of nursing 


activity. 


To illustrate this type of test situation, two 
items from the test on Unit I are given. 


Situation I. Mr. and Mrs. Evans were 
admitted to the ward following an auto- 
mobile accident. Mrs. Evans has head 
injuries and a fractured arm which is in 
a cast. Mr. Evans was apparently unin- 
jured. You were assigned to care for 
both patients. The doctor’s orders were: 


(1) Blood pressure stat. and q.1 hr. for 
12 hours, then q.4 hrs. 

(2) T.P.R. and B.P. q. 4 hours, and 

(3) No stimulants. 


1. When you went in to take Mrs. Evans’ 
temperature, pulse, respiration and 
blood pressure, she was having a chill. 
Her skin was moist and cool. Your 
nursing knowledge would direct you to: 


a. Close the windows and doors 

b. Reassure the patient immediately 
c. Apply external heat 

d. Take the pulse and respiration 

e. Take an oral temperature 
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The Correct Response 





(1) a.b. (2) a.e. (3) all except e. (4) none 
of these (5) all except c. 


2. The principles underlying your choice: 


a. The most important means of con- 
trolling heat loss is through the skin. 

b. Emotions such as fear depress the 
nervous system which results in 
lowered temperature. 

c. Taking temperature at this time would 
be indicated. 

d. The pulse and respiration would in- 
dicate the patient’s condition. 


e. The surrounding air currents affect 
body temperature. 


The Correct Response 





(1) a.d. (2) b.c. (3) All except d. (4) None 
of these (5) All except c. 


This section of the test was organized under the 
heading of tion and Application of 
Facts and Principles and constitutes half of the 
test. This section will be referred to as Part 
One of both tests. 





A necessary condition for adequate understand- 
ing and application of principles is knowledge of 
the facts and principles themselves. Therefore, 
twenty items which gave a sampling of the facts 
and principies relating to the nursing activities 
studied were included in each of the tests. 
These are organized under the heading of 


Facts and Principles and will be referred to 
rT) Two of both tests. 


A recognition of the meaning of important 
technical terms is essential for the nurse in 
order for her to describe and to record the 
symptoms and the conditions of the patients and 
treatments given. Therefore, a third part of the 
test included Vocabulary items and will be re- 
ferred to as ee of both tests. 


It is realized that measurement of one of the 
important objectives listed in the unit was not 
provided for in the test: the measurement of 
the skill in performance of the nursing activi- 
ties studied. It is quite possible to make tests 
which measure performance with considerable 
precision and that have high validity and re- 
liability. It was not possible for the writer to 
measure performance of the students in this 
experiment because the techniques of nursing 
procedures vary greatly in the different hospit- 
als, and thus no uniform procedure was avail- 
able for comparison purposes. However, in- 
direct measurement of performance of nursing 
activities was obtained through the verbal test 
because typical nursing situations were set up 
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as problems for the students to solve. 


The items selected were checked against the 
unit plans, the textbook of the course, and the 
film content to insure adequate sampling. 


2. Judgment of Competent Persons 


The tests were submitted to the subject mat- 
ter experts who had evaluated the units for 
criticism and comments. As a still further 
determination of validity and reliability, the 
tests were submitted to the teachers partici- 
pating in the experiment for their critical eval- 
uation after the units were completed. 


3. Preliminary Experiment 


A preliminary experiment was carried out in 
two schools of nursing to check planned proced- 
ure and technique of the experiment and to im- 
prove the units. This trial experiment also pro- 
vided data on the tests which could be-analyzed 
and thus improve the validity and reliability of 
the tests used in the final experiment. All the 
conditions previously outlined for the experi- 
ment except the use of slidefilms and random 
selection were followed. The two schools of 
nursing each had 35 students divided into two 
sections, (1) experimental, and (2) control. The 
proposed units were followed. The tests were 
administered as pretest and final test, one for 
each unit. The scores on the total test and on 
each item were tabulated, after which an item 
analysis was made on each item. The item 
analysis included an analysis for difficulty and 
discriminating power. 


The procedure and technique planned for the 
study proved to be efficient and successful, and 
therefore was retained for the general experi- 
ment. However, upon the recommendation of 
the two teachers participating in the experiment 
the time allotment was changed. Unit One was 
changed to thirteen hours and Unit Two to fif- 
teen hours. 


4. Final Revision of Tests 


On the basis of ..e item analysis, evaluation 
and ratings by t: teachers and the data from 
the trial experiment the tests were revised. 
The items of the tests were so arranged that 
separate scores could be obtained for each of 
the three parts of the tests. The final revision 
of test for Unit One resulted in 90 items and for 
Unit Two in 85 items. The tests were construc- 
ted so that answer sheets for machine scoring 
could be used. The tests were scored by the 
number of rights. 


The reliability coefficient for inter-test reli- 
ability computed by the odd-even correlation 
technique on 50 students selected at random 
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from the total group and corrected with the 
Spearman-Brown prophecy formula was: Unit 
One -- Pretest .84 and final test .65; and for 
Unit Two -- pretest .72 and final test .91. The 
odd-even reliability of .65 seemed suspiciously 
low, so it was decided to find the test-retest re- 
liability. The test-retest reliability coefficient 
for the final test for Unit one proved to be .92 
and for Unit Two .91. 


5. Construction of the Attitude Scale 


In evaluating the effectiveness of any teaching 
material it is to be remembered that all the 
advantages which accrue to the student cannot be 
measured directly. The reaction of the student 
to the use of motion pictures and slidefilms in 
part determines their effectiveness as learning 
aids. Accordingly, it was early felt in this in- 
vestigation that to get a reasonably complete 
evaluation of sound motion pictures and silent 
slidefilms with respect to teaching Nursing Arts 
it was necessary to obtain the students’ opinion 
of their value, (1) as learning and teaching aids, 
(2) as aids for the learning of specific nursing 
activities, as well as to, (3) the effectiveness of 
their order of presentation, and (4) their func- 
tions. 


The items on the attitude scale were organized 
into three parts. In the first part, seven state- 
ments were made about the effectiveness of 
sound motion pictures, slidefilms, and the lect- 
ure-demonstration method as compared with 
each other, and the student was asked to rate 
them by stating whether she strongly agreed, 
agreed, was uncertain, disagreed, or strongly 
disagreed. 


The second part of the attitude scale asked for 
the student’s rating of motion pictures, slide- 
films, combination of both, or neither in the 
teaching of each one of the nursing activities 
studied. This section also made provision for 
the selection of order of presentation of the 
teaching aids preferred by the students. 


Last, the student was asked to give her gen- 
eral feeling toward the use of audio-visual aids 
in teaching of Nursing Arts by rating the func- 
tions generally defined for audio-visual aids. 
These are: (1) promoting interest; (2) stimula- 
ting to further study; (3) increasing amount of 
learning; and (4) making learning easier. 


The attitude scale required about ten minutes 
to fill out and was administered immediately 
after the final test was given for Unit Two. The 
scores were tabulated for each item on the scale 
and organized into four groups according to the 
methods used. 





G. Design of the Experiment 


A 2x2 factorial design with partial confounding 
was used as the pattern for the experiment. It 
was a design with two factors used at three ley- 
els in twenty-four treatments. The variables in 
the design were, (1) two methods, motion pic- 
tures and slidefilms, and (2) the order in which 
they were presented or the levels of use. The 
methods used and the designation for each were 
as follows:* 

Methods Designation 
Motion Pictures 
Slidefilms 
Motion Pictures and 

Slidefilms 
Neither of the two 

(Control) 


The order of presentation in which these 
methods were used and the designation for 
each were as follows: 


(1) Motion pictures used at the beginning of 
unit and again in the middle of the unit, 
designated as M,,,, 


(2) Motion pictures used at the middle of 
the unit and again at the end of the unit, 


designated as Mme 


(3) Slidefilms used throughout the unit, that 
is, pictures were used as they related to 
the topic under discussion, designated 
as S 


(4) Slidefilms used at the beginning of the 
unit and again at the end of the unit, des- 


ignated as Spe 


° 6 


Since there were not enough sections to pro- 
vide for complete replications of all possible 
method and order of presentation combinations 
partial confounding was used. M and 8 were 
confounded in the experiment while MS was not 
confounded, This gave the combined effect of 
MS the greatest opportunity to show up and in- 
creased the precision of the measurement of 
the MS effect. However, in the combinations 
of the order of presentation of methods repli- 
cations M and S appear twice as frequently as 
the combination MS. Thus confounding M and 
8 separately all effects are evaluated with 
approximately the same degree of accuracy. 


The combinations resulting from union of the 
methods and order of presentation and the 
number of replications of each combination 
are given in Table II. 





*Symbols used in dissertation were A for motion pictures, B for slidefilms, AB for combination 
of motion pictures and slidefilms, and (1) for control. 
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Since there were twelve schools of nursing, 
each representing a class participating in the 
experiment, they were arranged in the over- 
all design as squares or plots. Each square 
was further divided into two or more sections. 
The number of sections which were already 
set up in the school remained as the number 
of sections used; however, the students were 
reassigned by random selection. There were 
20 sections already set up in the schools, and 
since only 24 were used for the analysis of 
variance, the five extra sections served as 
additional control groups and were used in the 
computation of means and standard deviations 
for each of the four total groups, namely, M, 
8S, MBS and C. 


The twelve classes were arranged into six 
blocks. Methods were assigned randomly to 
each class and to each section and were as 
follows: squares I, II, Ill, and IV were respec- 
tively assigned and S;, Mpm and C, 
MpmSt and Mb, a wy te - Yr, m, m’, 
and IV’ were respectively assigned M, and 
St, Mme and C, Mme St and Mme, St ; 
and squares I’, Ii’’, Ill’’, and IV’’ were respec- 
tively assigned MpmSbe and She, Mpm and C, 
MpmSbe and Mbm, Sbe and C- Random assign- 
ment was done through random selection using 
Fisher’s Table of Random Numbers. (21) 


H. Statistical Analysis 


Scores were obtained and recorded separ- 
ately for parts one, two, three, and total for 
the pretest and the final test for each unit. 
Gains were determined for parts one, two, 
three, and total on achievement tests for each 
unit. These gains and the total scores for the 
pretest and the final test for each unit were 
used throughout the analysis of the experimen- 
tal results. Group means were obtained for 
each section, since all analysis employed 
group means and not individual scores. 


A general analysis was made first by arrang- 
ing all the sections into four groups according 
to the methods used. These were MS, M, 8 
and C. The mean and standard deviation were 
then computed for each group and placed in a 
small table on the same page as the Analysis 
of Variance Results. 


A table was prepared for each set of scores 
on which the means for each section were re- 
corded in such a manner that the sum of means 
for each class, the sum of squares for the total 
groups and the blocks could be obtained. Form- 


ulae used were 
37 
(523? - (x 3)? 


Total SS : 5x? 


Blocks SS : 





To obtain the SS for method and the SS for 
the error involved in method MS an Inter- 
action Table was constructed for each set of 
scores. Because MS was not confounded in 
any of the blocks it was obtained from all the 
blocks and the following formula was used to 
construct the table: 


MS (all blocks) = MS +C-M-S 


To obtain the SS of MS method and error, 
the following formulas were used: 


MS total 88 : 1. (RT, - RT2)? 
Error $8 : 1 (C1,+ Cag- C2q - C12) 


Method M was confounded in blocks I, I’, 
I1’’, hence blocks I, I’, I’’ were used to obtain 
SS for method M. The following formula was 
used to obtain M: 


Bethed M, $8: 4 (MS + M-S - C)? from 
blocks I, I’, Yr’, 


Method S was confounded in blocks I, I’, I’’ 
hence blocks II, II’, Il’”’ were used to obtain 
SS for method 8. The following formula was 
used to obtain 8: 


Method 8, 88: 1 (MB +8 - M- C)* from blocks 
0, 7, yy. 


The remaining 8S of deviation for the error, 
which is actually the deviation of the order of 
presentation of the methods in various com- 
binations, differences between sections in 
similar classes were obtained by comparing 
differences due to methods used in each class 
and then summed to obtain the total SS for 
order of presentation. The formulas used 
were: 


1. Error SS: 2 Fines - s)] . [ (as . s)?] from 
2. Error 83: 4 3[(a - c)*] -[(m - Cc)? ]trom 
3. Error of. ie - m)?] -[ (as - m2] 


f classes III, af 
4. Error 88: reach e ba mr - C)?] from 
classes IV, IV’, IV’’. 


The results of the analysis of variance for 
gains on parts one, two, three and total on 
initial and final tests (Achievement) and the A. 
C. E. total test were arranged in standard 
table form. The of freedom are al- 
ways (N-1) entries or restrictions imposed in 
the analysis. Since there were 24 sections or 
entries in the table a total of (N-1) or 23 de- 
grees of freedom were assigned in the anal- 
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ysis. These were partitioned into eleven de- 
grees for classes; methods M,S, and MS one 
degree each; and order of presentation 9 
degrees. (One degree for each order of pre- 
sentation of M,S, and MS). Thus a total of 23 
degrees of freedom were obtained. 


The mean square or variance was obtained by 
dividing the sum of Squares by its correspond- 
ing degrees of freedom for each entry in the 
table of squares. The F test of significance was 
next applied. This was done by dividing the 
mean square of a given comparison by the 
mean square of the comparison to be tested and 
thus obtaining the F ratio. The F ratio was 
next compared with F as computed in the table 
of F values. (48 p. 62) The table F was enter- 
ed under the corresponding degrees of freedom 
at the .05 and the .01 level of confidence. If the 
obtained value is higher than the F value of the 
table, the finding is significant. 


In order to obtain the interaction between the 
methods M and S, the response to each was de- 
termined by using the means for the groups in 
the following manner: 


Response to M: } [M (M) - M(C)] - [ae (Ms) - 
M (8) 


Response to S: $ ([M (8) - M (C)] - [1M (Ms) - 
M (M)] 


In order to make further comparisons and de- 
termine the correlations between the various 
parts of the tests, the pretests and the psycho- 
logical test correlation coefficients were com- 
puted for the following for Unit One: 


Pretest vs. Gains One 
Pretest vs. Gains Two 
Pretest vs. Gains Three 
Pretest vs. Total Gains 


A. C. E. vs. Gains One 
A. C. E. vs. Gains Two 
A. C. E. vs. Gains Three 
A.C. E. vs. Total Gains 


Gains One vs. Gains Two 
Gains One vs. Gains Three 
Gains Two vs. Gains Three 


Since students vary in ability levels, in order 
to make valid comparisons, such comparisons 
had to be made at the point in the achievement 
from which each student began. This is repre- 
sented in the pretest and A. C. E. scores. 
Thus by using these scores as starting points 
one could make comparisons with the gains 
made by each student. 





Section IV 
RESULTS AND ANALYSS 


Complete data were obtained on 396 of the 405 
students originally used, and therefore only 
these students are included in the analysis. Of 
this group of students, 99 were in the sections 
which used the motion pictures and slidefilms 
(MS); 92 used motion pictures only (M); 88 
used slidefilms only (S); and 117 were in the 
control groups (C). 


A. Level of Ability of Groups 


Summary of the means, together with the stan- 
dard deviations for the four major groups (MS, 
M, S, and C), may be found in Table III for the 
psychological test and for the achievement 
tests for both units. 


An inspection of this table shows that the S 
group were lowest in ability level as measured 
on the psychological test and the achievement 
tests; the MS group highest on the psychologi- 
cal test; and the M group highest on achieve- 
ment test for both units. However, when the 
difference in means was tested, the ratio did 
not reach the level of significance. 


B. General Analysis 


Since the achievement test for each unit was 
used as a pretest and a final test, the gains 
made by the students on each part of the test 
were used in all analyses. A summary of 
the means and standard deviations for the 
gains on Parts One, Two, Three, and Total 
may be found in Table IV for Unit One and in 
Table V for Unit Two. 


An inspection of these tables shows that the 
S group had the highest score in both units on 
the total gains and on all parts of the tests 
except Part One in Unit One and Part Two in 
Unit Two; and the M group lowest on the total 
gains, as well as on all parts of the tests for 
both units. However, again when the differ- 
ence in means was tested with the F test, no 
statistically significant difference was found to 
exist. 


C. Analysis of Variance Results 


In spite of the fact that the differences between 
the gains of the four total groups on the final 
achievement tests proved to be statistically non- 
significant in the above analysis, it is possible 
that a real difference would be revealed by a 
more sensitive analysis. Hence, the analysis 
of variance was employed in analyzing the data 








nN 
— 


Se 


SeTesgsessseser._ 








JOURNAL OF EXPERIMENTAL EDUCATION 


obtained on Unit One and Unit Two. 


As pointed out previously, the analysis of var- 
iance is a technique which enables us to validly 
collate results from all groups used in the ex- 
periment from different schools into the var- 
ious categories desired. To put all these groups 
together, as was done in the preceding section, 
and to make comparisons does not give a very 
valid or rigorous basis for inference, for the 
experiment includes intact groups from a num- 
ber of schools. Hence an assignment to meth- 
ods does not represent random selection from 
all the students involved in the experiment, 
even though random selection was used in 
assignment of students in each school to sec- 
tions and of methods to groups. 


The controlled variables in the experiment 
include the differences between the methods 
used. The uncontrolled variables are those 
due to errors of random sampling, teacher 
competence, and differences in the order of 
presentation of the visual materials. Uncon- 
trolled variables cannot be evaluated directly, 
but through statistical randomization we are 
able to make their effect felt in the estimation 
of error. Thus while we do not know their ex- 
act magnitude, we do know within definite lim~- 
its the extent of variation they introduce. The 
order of presentation term was used as the 
error term because it included the variation 
described above. It refers to the period in the 
sequence of the unit at which the motion pic- 
tures and slidefilms were used. The sums of 
squares and the mean squares as obtained for, 
(1) the methods variations; (2) between classes 
variations; and (3) order of presentation vari- 
ations are give in Table VI. 


These are given for the A. C. E. test, pre- 
test, final test, and gains on Parts One, Two, 
Three, and Total for each unit. The degrees 
of freedom assigned were one for each method 
variance, two for between classes variance, 
and nine for within classes variance, or Order 
of Presentation. The mean square was obtained 
by dividing each sum of squares by its corres- 
ponding degrees of freedom. The variance for 
each method was divided by the between class 
variance and by the within class variance and 
an F ratio was obtained for each. 


The analysis of variance results may be sum- 
marized as follows: 


1, Level of Ability of Students 
a. Psychological test 
No statistically significant difference in 
achievement at the .01 or the .05 level 
was found between classes or within 
classes on either unit. 





b. Nursing Pretests for Units One and Two 
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No statistically significant difference in 
achievement at the .01 or the .05 level 
was found between classes or within 
classes on either unit. 


Therefore it seems logical to infer that the 
groups M, M, S and C used in the study were 
approximately of the same ability level since 
the evidence did not refute the null hypothesis. 


2. Final Achievement Tests for Units One and 
Two 





a. Total test and Gains 


No statistically significant difference in 
achievement at the .01 or the .05 level 
was found when the comparison was made 
between methods and classes on either 
unit. 


Therefore it seems logical to infer that no sig- 
nificant difference in achievement on the tests 
for both units existed between the groups that 
used motion pictures and slidefilms, motion 
pictures alone, slidefilms alone, or neither 
motion pictures nor slidefilms. 


b. Order of Presentation of the visual Aids 


No statistically significant difference in 
achievement at the .01 or the .05 level 
was found between the groups when the 
comparison was made between the meth- 
ods and the order of presentation of the 
visual aids. 


Therefore on the basis of the evidence, it seems 
logical to infer that no significant difference in 
achievement on the tests for Units One and Two 
existed between the groups who used the motion 
pictures at the beginning and middle of the units 
or at the middle and end of the units and the 
groups who used slidefilms at the beginning and 
end of the units or throughout the units. 


c. Parts One, Two, and Three of the Achieve- 
ment Tests 


No statistically significant difference in 
achievement at the .01 level was found 
when the comparison was made between 
classes or within classes variations on 
either unit. 


No statistically significant difference in 
achievement at the .05 level was found 
when the comparison was made between 
methods and classes on either unit. 


A difference in achievement at the .05 
level was found on Parts Two and Three 
in Unit One and Part Three in Unit Two 
when the comparison was made between 
methods and order of presentation. 


Since 42 comparsions were made for each 
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unit one would expect that by random sampling 
variation alone at least two comparisons would 
exceed the .05 tabled values of F. Consequent- 
ly, it is extremely doubtful whether these dif- 
ferences were significant or not since there 
were only two ratios which exceeded the F 
tabled values in Unit One and one ratio on 

Unit Two. Therefore, on the basis of the evi- 
dence it seems logical to infer that no signif- 
icant difference existed in achievement on Part 
One, Application of Principles, Part Two, Facts 
and Principles, and Part Three, Vocabulary on 
both tests for Unit One and Unit Two between 
the groups which used motion pictures and slide- 
films, motion pictures alone, slidefilms alone, 
or neither motion pictures nor slidefilms. 


D. Summary of Analysis of Results 


The means on the nursing pretest and the fin- 
al achievement test for both units were present- 
ed for all the sections participating in the ex- 
periment. They were reported separately for 
each unit and grouped according to the methods 
used in the experiment. The differences in the 
means obtained by the groups for both Unit One 
and Unit Two when subjected to the F-test at 
the .05 level revealed no significant difference 
in achievement by any of the groups. 


The analysis of variance results for Units 
One and Two showed no statistically signifi- 
cant difference in achievement on the nursing 
achievement tests by any of the groups at the 
-01 or the .05 level. This was true when the 
F ratio was obtained either by a comparison of 
methods and classes, or by methods and the 
order of the presentation of the visual aids. 
Even when a less rigorous standard was used by 
accepting the .05 level of significance, no statis- 
tically significant difference was found when the 
different parts of the tests were considered 
separately, for out of the 84 comparisons for 
both units only three ratios were above the 
tabled values of F. Out of this number of com- 
parisons one would expect by random sampling 
variation alone that at least four comparisons 
would exceed the tabled values of F at the .05 
level. On the basis of evidence, therefore, it 
seems logical to infer that no s icant diff- 
erence in achievement existed between the | 
grou who used the motion pictures and slide- 

together, the motion pictures alone, the 


slidefilms alone, or neither the motion pictures 
nor the slidefilms, and difference Pe did 
oaist could be attzlbuted > random sampling — 


Variation. 




















If the analysis of results had been concluded 
with the general analysis of differences in 
means and the critical ratio or a similar test 
used to determine significance of the differ- 
ences in means of the groups as has been done 
in most of the experiments reviewed, the re- 








sults in all probability would have been differ- 
ent and may even have been significant accord- 
ing to these tests. However, in this experiment 
analysis was carried much further and was 
based on a carefully designed factorial pattern 
which provided a better inferential basis and 
permitted a more rigorous analysis. 


E. Relationship Between Psychological Test 
Performance and Gains Made on Unit One 


In order to determine the relationship between 
the level of ability as measured on the psychol- 
ogical test and the gains made on each part of 
the achievement test by the various groups, the 
correlation coefficient was computed between 
the gains made on each of the parts of the test 
for each total group and the psychological test 
scores. These relationships are exhibited in 
Table VI. 


The outstanding fact to be noted from the data 
of the above table is that the relationships are 
extremely low. Only two correlation coeffic- 
ients are over .19 and most of them are below 
.10. A correlation of .19 at the .05 level and of 
-25 at the .01 level is required for significance 
for the size of sample used here. Of the 16 
entries in the table, two are above the tabled 
values for significance at the .01 level, and 
these were found when the comparisons were 
made between the psychological test and 
gains on Part Three. Since the gains made 
on Part Three are on vocabulary, it seems 
likely that a significant relationship existed. 
All the other observed relationships are well 
below the level at which 5 out of 100 could be 
expected by chance variation. 


F. Relationship Between Initial Perform- 
ance on Achievement Test and Gains 
Made on Unit One 


Correlation coefficients were computed be- 
tween the scores made on the pretest and the 
gains made for each part of the test for each 
of the four groups. These relationships are ex- 
hibited in Table VIII. 


An inspection of Table VIII indicated a signif- 
icant relationship, although not very high, be- 
tween: 


Pretest and Gains, Part One at the .01 level 
for all groups 

Pretest and Gains, Part Two at the .01 level 
for the S and Control groups and at the .05 
level for all groups 

Pretest and Gains, Part Three at the .01 
level for the MS and Control groups and at 
the .05 level for M and S groups 
Pretest and total Gains at the .01 level for 
all groups 


Therefore, on the basis of the evidence it 
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seems logical to infer that a statistically sig- 
nificant relationship exists at the .05 level in 
all cases, and at the .01 level in all cases ex- 
cept four, between the initial performance of 
the students and the gains made on the achieve- 
ment test and that it is therefore possible to 
predict the success of student nurses on the 
achievement test as constructed for this exper- 
iment. The correlation coefficients obtained 
between the pretest and the gains made on the 
achievement, as can be noted from Table VII, 
show a negative relationship. That is, the 
groups that made the highest score on the 
initial performance made the lowest gain as 
measured on the achievement test. It would 
appear from these findings that the tests may 
not have had a sufficiently high level of diffi- 
culty and therefore attention should be directed 
to three factors: first, the type of test used in 
which over half of the items called for a selec- 
tion of principles and the application of these 
principles to selected nursing activities; 
second, that in the judgment of the teachers who 
used the tests in the experiment the tests were 
rated as having a large number of difficult 
items; and third, the level of difficulty as re- 
vealed by the item analysis. A summary of 

the item analysis is given in Table IX. 


An inspection of this table indicates that the 
average score on the final test on both units 
was in the intervals of 31 to 50 number of stu- 
dents answering items correctly. The total 
number of items on the test on Unit One was 90, 
The highest score made on this test by any 
student was 72 with the mean score for the 
group being 41. The total number of items on 
the test for Unit Two was 85 items. The high- 
est score made on this test by any student was 
65 with the mean score for the group being 
around 37. Thus it would seem that the test 
was sufficiently difficult since the average 
score in each test was around 50 per cent of 
the total items. 


Correlation coefficients were also computed 
between the gains made on the various parts of 
the achievement tests in order to determine if 
there were any relationships present. Signifi- 
cant relationships were found between the var- 
fous comparisons of Gains One, Two, and Three 
for all groups except the motion picture group. 


G. Analysis of Attitude Scale 


In the preceding discussion, attention has been 
focused on comparisons of the various methods 
groups on the achievement tests in the course 
Introduction to Nursing Arts. As was pointed 
out earlier, the effectiveness of any teaching 
method was determined in part by the student’s 
reactions to the use of the methods. Therefore, 
an attempt was made to obtain the student’s 
opinions of the use of motion pictures and slide- 
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films in the teaching of Nursing Arts. 


The rating scale was presented to the students 
immediately after the final test on Unit Two. 
All the students who used either the motion 
pictures or the slidefilms in either one of the 
units were placed together in group MS. Since 
some of the sections used motion pictures in 
Unit One and then by random selection were as- 
signed to use slidefilms in Unit Two, the number 
of students using either of the two methods con- 
stituted over one-half of the students, or 206. 
The number of students coming under the other 
categories were: (1) 92 for motion pictures only, 
(2) 40 for slidefilms only, and (3) 58 for the con- 
trol groups who had neither motion pictures nor 
slidefilms. Scores were tabulated for each 
statement according to the points used in each 
part of the scale and then weighted and averaged, 


The effect on attitudes may be summarized as 
follows: 


1. The students strongly agreed that the 
motion pictures helped them in the units 
studied. 


2. The students did not agree about the value 
of slidefilms nor as to whether they pre- 
ferred the slidefilms or the motion pic- 
tures. 


3. All students agreed that they would like to 
see similar motion pictures used in other 
units in Nursing Arts. 


4. All the students grouped together gave the 
following ratings to motion pictures and 
slidefilms as used in the study of nursing 
activities in Unit One and Unit Two: 

First-- Motion Pictures 

Second-- Motion Pictures and Slidefilms 
used together 

Third--Slidefilms 

Fourth--Neither motion pictures nor 
slidefilms 


5. Students preferred two showings of mo- 
tion pictures to one showing but gave no 
preference as to their use in the 
and middle of the unit or at the middle 
end of the unit. 


6. Students preferred the use of slidefilms 
with the lectures to their use as merely 
introductory and summary devices. 


7. All the students grouped r rated 
the functions of motion pictures and slide- 
films as follows: 


Motion Slide- 
Pictures films 
ist 


a. Making learning easier ist 











b. Promoting interest 2nd 3rd 
c. Increasing amount of 
learning 3rd ss 2nd 
d. Stimulating to further 
study 4th 4th 
Section V 
SUMMARY AND CONCLUSIONS 


The purpose of the study was to obtain data to 
test the null hypotheses which were set up re- 
garding the effectiveness of motion pictures and 
slidefilms for the learning of certain nursing 
activities taught in the course Introduction to 
Nursing Arts. The typical teaching situation as 
it exists in schools of nursing in the State of 
Indiana was preserved insofar as possible. The 
data obtained included measurement of achieve- 
ment on (1) yocabulary,(2) facts and principles, 
and (3) application of principles. In the experi- 
ment, emphasis was placed on the understanding 
of the scientific principles underlying the nurs- 
ing activities studied. The activities included 
those concerned with the taking of temperature, 
pulse, respiration and blood pressure, and those 
concerned with the therapeutic uses of heat and 
cold. Data was also obtained on the student’s 
reactions and attitudes toward the effectiveness 
of the use of audio-visual materials in the learn- 
ing of nursing activities. 


A 2x2 factorial design with partial confound- 
ing of M and § effects was used, in which the 
motion pictures were designated as M, the 
slidefilms as 8S, motion pictures and slidefilms 
used together as MS, and the control groups as 
C. The motion pictures and slidefilms were 
used in two different orders of presentation 
each. Assignment to methods and sections was 
done by random selection using Fisher’s Table 
of Random Numbers. 


Unit Plans and Achievement tests were espec- 
lally constructed for the experiment and were 
both tried out in a preliminary experiment. 
Data from this experiment was used for item 
analysis and to try out the experimental method 
planned. 


Analysis of variance was employed in the an- 
alysis of the data obtained in the experiment; 
therefore, sums of squares and mean squares 
were computed for (1) each method - MS, M, S, 
(2) between classes or school differences, and 
(3) within classes or order of presentation vari- 
able. The F test of significance was used to de- 
termine whether the results obtained were sig- 
nificant. 


in view of the fact that no statistically signifi- 
cant difference in achievement by the groups 
that used motion pictures, slidefilms, anda 
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combination of motion pictures and slidefilms, 
or neither motion pictures or slidefilms was 
found at even the .05 level for comparisons 
made between the method and classes or meth- 
ods and order of presentation, it seems logical 
to infer that the motion pictures and slidefilms 
made no contribution significantiy greater than 
that of the control groups to the learning of 
these units in Nursing Arts. 


Careful effort was made in the experiment to 
control those things which could be controlled. 
Special units were made in detail including out- 
line of objectives and subject matter content so 
that identical content was covered by all the 
teachers participating in the experiment. Spec- 
ial tests were constructed which were used as 
pretests and final tests on the units. Both the 
units and tests were submitted to subject-matter 
experts and experienced teachers of Nursing 
Arts for their evaluation and suggestions. Both 
units and tests were used in a preliminary ex- 
periment. Those factors which could not be con- 
trolled were randomized so that their effects 
could enter into the estimate of error term. 
Therefore, on the basis of the type of experiment 
used, the kind of analysis made, and on the evi- 
dence found it seems logical to infer that motion 
pictures and slidefilms made no significantly 
greater contribution than that of the control 
groups to the learning of the units in Nursing 
Arts in this experiment. 


The findings of this experiment, it would seem 
to the writer, further emphasize the need for 
many more carefully designed and statistically 
analyzed studies before the many problems re- 
lating to audio-visual education can be answered. 
A review of the research showed the lack of a 
systematic approach to the study of the problems 
in this field. It would seem essential that before 
any of the problems in the field can be critically 
evaluated and solved, a systematic rationale for 
their study needs to be developed. If a system- 
atic and continuous pattern of investigation were 
set up, then the many problems which need in- 
vestigation could be correlated and unified; this 
would result in a much deeper and more com- 
prehensive study of the field than has hereto- 
fore been made. The types of learning which 
each audio -visual aid will best produce could 
be related to the specific aid and in turn re- 
lated to how the aid could be used most effec- 
tively. The influence of these aids on the spec- 
ific content of the unit could also be related to 
the level of the material in terms of the ma- 
turity of the learner. And still further, the 
combining of the audio-visual materials with 
other learning experiences to produce maxi- 
mum educational effectiveness could be inves- 
tigated. 
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FIGURE 1 


DESIGN OF THE EXPERIMENT 


Two Factors M and S&S: at Three Levels: in 24 Treatments 






























































































































































M - Motion Pictures 
s - Slidefilms 
MSs - Combination of Motion Pictures and Slidefilms 
Cc - Control 
Mpm - Presented at the beginning and middle of the unit 
Mme - Presented at the middle and end of the unit 
St - Presented throughout the unit 
Sbe - Presented at the beginning and end of the unit 
I I‘ 1” 
(1) (3) (5) 
MpmSt MmeSt MpmSbe 
I —_—_ T' Se EE 
St St Sbe 
(4) (6) 
n — nm — nm" ii. 
Cc Cc Cc 
ba bn ll 
(7) (9) (11) 
MpmSt Mmest MpmSbe 
I —_—_ m' m" ommneeE 
Mpm Mme Mpm 
8 10 12 
(8) St (10) (12) 8 
IV Iv' Iv" 
Cc Cc c 
Each small square represents a class of students and is numbered from 1 to 12. Each of the 










24 entries is the table represents a section.* Each large block represents a complete repli- 
cation of methods and is referred to as a block. 

Factor S is confounded in blocks I, I’, I". 

Factor M is confounded in blocks Il, I’, ll". 


*There were 5 additional sections used and they were designed as control groups, making a 
total of 29 sections used in the experiment. 
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TABLE Il 


SUMMARY OF COMBINATIONS AND REPLICATIONS USED 
IN EXPERIMENT 








Method & Order Replications 
of Presentation 























TABLE Ill 


SUMMARY OF MEANS AND STANDARD DEVIATIONS OBTAINED 
BY THE FOUR TOTAL GROUPS ON INITIAL ACHIEVEMENT TESTS 
AND ON PSYCHOLOGICAL TEST 











Psycho Unit One 
M " M s 
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TABLE IV 


COMPARISON OF GAINS OF TOTAL GROUPS OF STUDENTS FOR EACH METHOD 
ON VARIOUS PARTS OF ACHIEVEMENT TESTS- UNIT ONE 


(MS) N=99 (M) N=92 (S) N=88 (C) N=117 








Parts of Mean Gain 8.D. of Mean Gain 





Tests MS M s 





Total Test 13.75 9.42 
Part I 6.57 6.02 
Part II 2.84 . 2.69 
Part Ill 5.59 4.35 A 4.16 





























TABLE V 


COMPARISON OF TOTAL GROUPS OF STUDENTS FOR EACH METHOD ON VAR- 
TOUS PARTS OF ACHIEVEMENT TESTS-UNIT TWO 


(MS) N=100 (M) N=93 (S) N=88 (C) N=115 





Parts of Mean Gain S.D. of Mean Gain 








6.35 
2.22 
3.99 
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TABLE VI 
SUMMARY OF ANALYSIS OF VARIANCE RESULTS 
Methods 
Tests Order of 
MS M s Classes Presentation Total 
D.F. 1 1 1 11 9 23 
AC.E. ss 2.50 329.49 | 93.19 3196.97 1368.46 | 4990.61 
MS 290.63 152.05 | 216.98 
Pretest SS 2.77 12.16 9.65 620.23 104.85 | 749.66 
(Unit One) MS 56.38 11.65 32.59 
Pretest Ss Al 03 16 334.19 63.83 | 398.62 
(Unit Two) MS 30.38 7.09 17.33 
Final Test ss 1.48 17.76 26.26 323.09 109.80 | 478.39 
(Unit One) MS 29.37 12.20 20.80 
Final Test Ss 43 1.78 1.39 506.28 68.75 | 578.63 
(Unit Two) MS 46.02 7.64 25.16 
Gains- Unit One 
Total ss 20 53 4.08 | 817.04 95.01 | 916.86 
MS 14.28 10.56 39.86 
Part One 1.29 6.34 4.33 164.65 57.82 | 234.43 
MS 14.97 6.42 10.19 
Part Two ss 4.66* 96 54 61.76 5.83 13.75 
MS 5.61 165 3.20 
Part Three ss 34 5.15* 63 125.87 6.17 | 138.16 
MS 44 , ‘00 
Gains-Unit Two 
Total ss 1.24 3.13 2.50 851.94 26.37 | 885.18 
MS 17.45 2.92 38.49 
Part One ss 1.49 05 3.10 264.36 12.08 | 281.08 
MS 24.03 1.34 12.22 
Part Two ss 1.89* 03 79 20.76 3.06 26.53 
MS 1.89 "34 1.15 
Part Three ss 33 72 49 102.17 7.18 | 110.89 
MS 9.29 80 4.82 


























*The only F ratios obtained above tabled values at the .05 level (none at .01 level) when 
methods variance was divided by order of presentation variance. 
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TABLE VII 


RELATIONSHIP BETWEEN PSYCHOLOGICAL TEST AND GAINS ~ ON EACH 


PART OF THE ACHIEVEMENT TEST - UNIT O 
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Correlation Coefficient 
Variable MS M 8 Cc 
Psychological Test vs Gains I 18 .08 -08 04 
Psychological Test vs Gains II .03 05 05 -08 
Psychological Test vs Gains II 29 -10 -62 .02 
Psychological Test vs Total Gains -05 -08 17 18 





TABLE VII 


RELATIONSHIP BETWEEN INITIAL PERFORMANCE AS MEASURED ON PRETEST 
ANS GAINS MADE ON EACH PART OF THE ACHIEVEMENT TEST - UNIT ONE 




















(MS) N=99 (M) N=92 (S) N=88 (C) N=117 
Correlation Coefficient 
Variable MS M s c 
Pretest vs Gains I -.50 -.39 -.57 -.£3 
Pretest vs Gains II -.17 -.20 -.29 "6 
Pretest vs Gains ITI -.37 -.19 -.18 -.31 
Pretest vs Total Gains -.62 -.43 -.48 -.55 








Table values of correlation coefficients significant at the .01 = .28 and at the .05 = .22 


when N = 80 


TABLE Xx 
ITEM DIFFICULTY ACCORDING TO NUMBER OF ITEMS ANSWERED CORRECTLY 
BY STUDENTS 
































8 -_ Items on Test One Items on Test Two 
Pretest Final Test Pretest Final Test 

61-70 0 3 0 2 
51-60 3 12 0 22 
41-50 2 32 4 19 
31-40 13 16 13 13 
21-30 17 8 16 13 
11-20 14 4 14 3 
1-10 27 1 26 1 
70 76* 76 13* 73 








*This item analysis was made on the preliminary tests. Seventy-six items in test 
One and 73 items in test Two were retained, the remaining items used were new. 














A SUMMARY OF MENTAL HEALTH SURVEY OF 
SPARTANBURG COUNTY, SPARTANBURG, 


SOUTH CAROLINA’ 


E C. HUNTER 
Tulane University 


I. Problem and Procedure 





For most people the cause and nature of men- 
tal and emotional difficulties remain obscure. 
Unhappiness, depression, and maladjustment 
are generally regarded as inevitable and normal. 
Actually these conditions are not necessary and 
may be reduced or eliminated if causes are 
analyzed and proper steps are taken to rebuild 
attitudes and behavior. 


The purpose of this study was to investigate 
the mental health status of a representative 
sampling of the population in the city and county 
of Spartanburg, South Carolina. Attempts were 
made to identify mental health difficulties need- 
ing alleviation, and mental health assets needing 
improvement and amplification. Table I shows 
the population sampling making up the Spartan- 
burg Mental Health Survey by school grade or 
group, sex, and race. Approximately 200 cases 
were obtained for each of nine groups, while 
the average for each of sixteen groups was 164. 
A special effort was made to cover the city and 
county geographically and to include all types 
of schools and communities in the sample. 
Forty-three white schools and twenty-six col- 
ored schools administered tests in 144 class- 
rooms, covering all school grades from one 
through eleven. Approximately one-fourth of 
the cases in the school groups were colored 
pupils. This is the proportion of total colored 
school enrollment in the county. The coilege 
groups consisted of a sampling of white stud- 
ents only in the three colleges located in the 
city of Spartanburg. The adult group was made 
up of members of various occupational and so- 
cio-economic levels, including white and color- 
ed individuals of both sexes in the city and 
county. 


The California Test of Personality was used 
in the primary grades. This test consists of 
twelve sub-tests covering Self Adjustment and 
Social Adjustment. Reliability and validity of 
the 96 items are adequate for survey purposes. 
For grades and levels above the third, the 








Mental Health Analysis test was used. The 
Wicaaaiaey. Wlbrinbiise, Gocontary, and Adsit 
Series of the test were used at appropriate 
levels. The test comprises ten sub-tests cov- 
ering Mental Health Liabilities and Mental 
Health Assets. Each sub-test contains 20 
items. The reliability of the test is reported 
to be .96. Both tests were constructed by 
Thorpe, Clark, and Tiegs of the University of 
Southern California and published by the Cali- 
fornia Test Bureau. 


Il. Results 
Primary Gra 


Table 0 shows that the Spartanburg sample of 
575 pupils (425 whites, 150 colored) in the pri- 
mary grades (1, 2, 3) achieved medians above 
the norms in every part of the test. Highest 
scores were made on Social Adjustment while 
only slightly better than typical scores were 
made on Self Adjustment by the Spartanburg 
pupils. The Spartanburg median of 80.2 is e- 
quivalent to the 75 percentile. The range cov- 
ered the entire breadth of each sub-test with 
some zero and several perfect scores, indi- 
cating the unsuitability of the test for many 
pupils. The most favorable scores were made 
on parts dealing with feelings of personal 
worth and school relations. 


Upper Grades and Groups Totals 


Table I reveals that the S sample 
of 2372 individuals (1906 white, 466 colored) 
on the Mental Health Analysis made scores 
somewhat above the norms on four of the five 
Liabilities sub-tests and on all Assets sub- 
tests. On Emotional Instability Spa 
scores were equivalent to the 43 percentile, 
seven percentile points below the norms. 
Highest scores were made on the Physical De- 
fects, Close Personal Relationships, and Inter- 
Personal Skills. Results indicated that Spar- 
tanburg individuals from fourth grade up to 
adults were more than normally sensitive, 
tense, and given to excessive self-concern, 








*This study was made possible through a grant by the Converse College Research Committee on the 


Carnegie Foundation for Grants-In-Aid to — Staffs, Spartanburg, South Carolina. 
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escape from responsibilities, and making ex- 
cuses for failures. On the test as a whole, the 
Spartanburg median was equivalent to the 59 
percentile. The range of scores was great on 
all sub-tests with few very low scores and many 
perfect scores especially on Physical Defects 
and Close Personal Relationships. 


Results of Grades One, Two, and Three 


Table IV shows the medians for grades one, 
two, and three on the California Test of Person- 
ality, Primary Series. On all sub-tests the 
Spartanburg pupils were above the 50 percen- 
tile. Higher scores were made on Social than 
on Self-Adjustment. Pupils in grades two and 
three made slightly higher scores than pupils 
im grade one on all sub-tests except Freedom 
from Nervous Symptoms, in which case there 
appeared to be a gradual deterioration from 
grade one to grade three. On Social Adjust- 
ment definite increases in scores were made 
from the first through the third grade. No 
appreciable increases in scores were made on 
Self-Adjustment by second graders over first 
or by third over second graders. On the test 
as a whole, Spartanburg primary pupils were 
surpassed by only slightly more than one- 
fourth of the pupils on whom the norms of the 
test were based. As a group, the primary 
grade pupils were not significantly low in any 
part of the test. Individual profiles revealed 
some serious mental health deficiencies and 
adjustment needs. Approximately seven per- 
cent, or 40 of the 575 primary pupils tested, 
made extremely low scores. 





Results of Upper Grades and Groups 


Table V presents approximate percentile val- 

ues for medians of the various grades and 
of the Spartanburg sample on the Mental 

Health Analysis. Percentiles for each part 
the test, the Liabilities total, the assets total, 
and the Total Score are shown. Low percentiles 
indicate unfavorable results; percentiles of 50 
and higher indicate increasingly favorable re- 
sults in comparison with the norms established 
by the authors of the test. 


With respect to Behavioral Immaturity (L-A) 
percentile values for eight groups appeared to 
hover around 50; for five groups, substantially 
above 50; while for one group, the adults, the 
percentile was 35. College juniors and seniors 
showed the most favorable standing on this sec- 
tion. Median percentile for sixth, seventh, and 
tenth grades were approximately ten points 
above the norms. The veteran group, made up 
largely of college freshmen and sophomores, 
reached the 49th percentile which is only very 
slightly below that of the total freshman or soph- 
omore group. The percentile for all groups com- 
bined, 2372 individuals, was 52. By way of ex- 





plaining the one unfavorable median percentile, 
that of the adults, it should be pointed out that 
this group represented wide divergencies in 
background, education, and experience. It was 
the most hetereogeneous of all groups. Forty- 
four percent of the members of the adult group 
were colored people. Only about 20 percent of 
the membership of the school groups were 
colored pupils. The college and veteran groups 
contained no colored members. Colored groups 
made lower scores than whites. Since the adult 
contained a larger proportion of colored mem- 
bers, the relatively low percentile of the adult 
group may be. partially explained on this basis. 
In addition, possibly something of the actual 
conflicts, difficulties, and tensions of the times 
are reflected in the adult score. 


On the Emotional Instability sub-test (L-B), 
eight groups fell quite below the 50 percentile 
norm; three groups showed percentiles very 
close to the norm; while the college juniors 
and seniors and the eleventh grade achieved 
the 60 percentile. The combined percentile 
was only 43. As on the preceding sub-test, the 
adults made the least satisfactory score, reach- 
ing only the 32 percentile. Again it may be said 
that the confusion, uncertainties, and insecur- 
ities of today impinge with greater force on 
adults than on young people of school and col- 
lege age. At the same time, it may be-claimed 
that youth are being better prepared in and out 
of school to handle their emotional conflicts and, 
as a result, achieve better scores on a test of 
this kind. It should be kept in mind that here 
we are dealing only with central tendencies of 
groups. Individual cases in all groups and in 
all sections of the test made low scores, while 
other individuals made high scores. 


On the sub-test measuring Feelings of Inad- 
equacy (L-C), the fifth and eleventh grades 
and the adult group showed percentiles quite 
below the 50 percentile. The eleventh grade, 
for reasons unknown, fell to the 22 percentile 
on this section. The fourth, ninth, and tenth 
grades and the college freshmen reached per- 
centile equivaients in the neighborhood of 50, 
while the sixth, seventh, and eighth grades and 
the college sophomores, juniors, seniors, and 
veteran groups showed percentiles substantially 
above the norm. The combined percentile was 
53. The junior high school and college groups 
showed the most favorble results on this sec- 
tion. A gradual increase in percentile values 
from the freshman to the senior college year 
was noted. Apparently, college experience 
accompanies strengthened feelings of compet- 
ency or adequacy with respect to skills, gen- 
eral ability, and faith in the future. 


Except for grades four and five, whose per- 
centiles were very nearly 50, all groups showed 
quite favorable results on the Physical Defects 











sub-test (L-D), with a percentile range from 60 
to 95. The combined median percentile value 
was 81. In general, the higher the grade or 
age, the higher the scores on this scale. The 
college, adult, and veteran groups showed very 
high scores with percentiles of 90 or above. 
The large number of perfect scores, suggests 
that the items on this sub-test above the elem- 
entary school level are not suitable and have a 
very limited value. 


On Nervous Manifestations (L-E), all groups 
reached or exceeded the 50 percentile except 
grades four and five whose values were 33 and 
35 respectively. The eighth grade and three 
college groups each reached the 70 percentile. 
The veterans were 18 percentile points above 
the norm, while the adults reached the 57 per- 
centile. The value for all groups was 58. 
Thus, the Spartanburg sample of 2372 individ- 
uals appeared to possess slightly fewer mani- 
festations of nervousness from physical or 
emotional causes than was true of the sample 
upon which the test was standardized. 


Although based on separate tabulations, the 
Liabilities Score (L-S) combines the five Lia- 
bilities sub-tests and summarizes the mental 
health liabilities for the various grades and 
groups. These are the factors which should be 
minimized, reduced, corrected, or eliminated 
as far as possible. Liabilities percentiles for 
the fourth and fifth grades indicated the pres- 
ence of many such factors. The median Lia- 
bilities scores for these grades were 10 or 
more percentile points below the norm. It is 
interesting to note the wave-like variation in 
Liabilities percentiles from the elementary 
grades up to adults. The average percentile 
for the upper elementary grades was 42; for 
the two junior high grades, 54; for the three 
senior high grades, 47; for the four college 
years, 65; and for adults, 45. On the five Lia- 
bilities sub-tests combined, the Spartanburg 
sample of 2372 individuals showed a percentile 
equivalent of 53. 


On each of the Assets sub-tests of the Men- 
tal Hygiene Analysis, all grades and groups 
of the Spartanburg sample exceeded the 50 per- 
centile. With respect to Close Personal Re- 
lationships (A-A), a gradual increase in scores 
from the fourth grade through the college years 
was noted, indicating a gradual growth in de- 
sirable personal relationship with advance in 
educational level. The combined percentile 
equivalent for this scale was 83, the highest 
total percentile for any part of the test. On the 
Inter-Personal Skills sub-test (A-B), the com- 
bined percentile was 77, with the upper elem- 
entary and junior high grades and the college 
groups making substantially higher scores than 
the senior high grades. The reverse of this 
trend occurred on the Social Participation sub- 
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test (A-C) which revealed the senior high 
grades as achieving higher percentile values 
than the lower grades or the college groups. 
Relatively lower scores for the latter groups 
may be in some measure related to the ever 
increasing amounts of individual work and de- 
creasing amounts of group activity requiring 
cooperation and mutuality at the college level. 
Social participation is related to the feeling of 
status and belonging and is easily promoted in 
any group where ample cooperative enterprises 
are provided. 


With respect to the Satisfying Work and Rec- 
reation sub-test (A-D), all grades and groups 
were only slightly above the 50 percentile, ex- 
cept the adult group which reached the 84 per- 
centile. The combined percentile was 61, the 
lowest of any of the Assets sub-tests. On the 
last Assets sub-test, Outlook and Goals (A-E), 
the combined percentile was 77 with a fairly 
gradual increase from 58 in the fourth grade 
to 94 in the college senior group. Apparently, 
there is a rough relationship between a satis- 
fying philosophy of life and educational level. 
Education and training lead toward more worthy 
goals in harmony with socially acceptable, eth- 
ical, and moral principles. 


On the five Assets sub-tests combined (A-S), 
the Spartanburg groups ranged in percentile 
equivalents from 60 to 91 with a median total 
percentile of 68. The wave-like variation on 
Assets percentiles from the elementary grades 
through the college years was similar to that 
for the Liabilities. On the Assets the average 
percentile for the three upper elementary 
grades was 64; for the two junior high grades 
seven and eight, 68; for the senior high grades 
nine, ten, and eleven, 62; and for the four 
college years, 76. The adults showed a very 
high percentile of 91 on the assets sub-tests. 
All groups combined on the Assets reached the 
68 percentile. 


With respect to Total Score percentiles, the 
range was from 48 in the fifth grade to 75 in the 
college junior and senior groups. Perhaps the 
most interesting observation to be made from 
Table V ts the variation in percentiles for the 
various units of the school system. On the Men- 
tal Health Analysis as a whole, the average per- 
centile for the fourth, fifth, and sixth grades 
was 52; for the junior high grades, 58; for the 
senior high grades, 52; for the college groups, 
70; and for adults, 66. The test scores seemed 
to indicate that the junior high school in Spar- 
tanburg is making a somewhat greater contri- 
bution to the mental health of pupils than the 
senior high school. The upward trend in scores 
was not maintained in the senior high. Percen- 
tiles based on medians actually minimize the 
difference. On the test as a whole, the Spartan- 
burg sample showed a 15 percent better than 
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average mental health status. The few de- 
ficiencies occurred in the Liabilities categor- 
ies, the most significant appearing in the 
Emotional Instability sub-test. This is the 
only category in which the total group fell be- 
low the norms. Again, it is emphasized that 
interpretations based on central tendencies 
should not obscure the fact of range of scores. 
Actually, many individuals in all groups and on 
practically all sections of the test made low 
scores, indicating mental health deficiencies 
of a more or less serious nature. It is safe to 
say that five to ten percent of all pupils in 
grades four through eleven need special atten- 
tion devoted to adjustment difficulties, partic- 
ularly in the realm of emotional conflicts. I 
should be recalled that this was the general 
area of lowest scores made by primary pupils 
on the personality test. Both tests revealed 
the specific need in Spartanburg schools of 
helping pupils resolve their emotional con- 
flicts, relieve their anxieties, and develop 
more emotionally secure and independent per- 
sonalities. 


Results by Type and Size of School in 
Primary Grades 


Table VI presents medians and correspond- 
ing percentiles by size and type of school for 
the primary grades in white schools on the 
California Test of Personality. The number 
of cases is small for mill schools and schools 
with one to three teachers. However, the 
sampling was carefully made. From statis- 
tical calculations not shown here, it was deter- 
mined that a central tendency difference of 
approximately three points with as few as 50 
cases yielded a statistically reliable difference 
between groups for Self Social Adjustment. 
With 100 cases in the groups a central tendency 
difference of two points yieldeda reliable differ- 
ence for the relatively small standard deviation 
of these groups. On Total Adjustment a central 
tendency difference of four to six points between 
any two groups, depending on the number of 
cases, yielded a reliable difference. Accord~ 
ingly, it may be pointed out from Table VI that 
children in urban schools made significantly 
higher Self, Social, and Total Adjustment scores 
than did children in rural schools, mill schools, 
1-3 teacher schools, or in 4-6 teacher schools. 
Some overlapping of cases occurred in tabu- 
lations for urban schools and schools with more 
than 7 teachers. As a result, comparisons 
among these groups should not be made. It is 
interesting to note, however, that the second 
highest adjustment scores were made by 
schools with 7-10 teachers. Apparently, these 
data in part substantiate the claim that a mod- 
erately sized school with 7-10 teachers accom- 
plishes better results in personality develop- 
ment and in self and social adjustment than do 
smaller or larger schools. Lowest scores were 
made by children in the 1-3 teacher schools. 
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All the medians in this analysis were up to or 
above the 50 percentile. The urban schools and 
the 7-10 teacher schools reached the 95 per- 
centile in Social Adjustment. 


Results by Type and Size of School in Up- 
per Grades 


Table VII shows medians and corresponding 
approximate percentiles by type and size of 
school for Liabilities, Assets, and Total Score 
caloctell Atalay lof tas Uilocent groups 
sele car y for erent groups 
from the fourth through the eleventh grades. 

On the Liabilities sub-tests, pupils in mill 
schools and in schools with 4-6 teachers 

made the lowest scores. The most favorable 
Liabilities scores were made by pupils in 
schools with 7-10 teachers. This finding 
supports the data in Table VI for the primary 
grades. On the Assets sub-tests, however. 
pupils in the 7-10 teacher schools made 
slightly less favorable scores than pupils in 

the other groups. All groups on the Assets 
exceeded the 50 percentile with four groups 
reaching the 65 percentile. On the total Score 
percentiles for six of the seven groups fell 
very close to 50. The largest type school group, 
20 or more teachers, reached the 60 percentile. 
There is definite evidence from the Mental 
Health Analysis of the relationship between size 
oY Satal ball Ginna mental health adjustment 
scores. Favorable total scores increased grad- 
ually from small schools to large schools with 
the most favorable scores being made by pupils 
in schools with twenty or more teachers. 

There is no evidence in this study for differ- 
ence in mental health adjustment scores be- 
tween urban and rural groups. In fact, on Lia- 
bilities, Assets, and on the Total Score, the 
percentiles for urban and rural groups were 
exactly the same. The mill school group made 
a substantially lower percentile, 44, than did 
the urban or rural groups. The small school 
groups made slightly lower percentiles than 

did the urban or rural groups. 


On the Liabilities sub-tests, a difference in 
central tendency between groups of four points 
was found to be a highly reliable difference. 

On the Assets sub-tests and on the test as a 
whole, central tendency differences of three 
and five points respectively were found to be 
reliable. Accordingly, it may be pointed out 
that on the Liabilities the 20 or more teacher 
schools group each showed reliable superiority 
in scores over the mill schools group and the 
4-6 teacher schools group. On the Assets, the 
urban, rural, 4-6 teacher schools, 11-19 tea- 
cher schools, and the 20 or more teacher 
schools group each showed reliably superior 
scores to each of the following groups: the 
mill schools, 4-6 teacher schools, 7-10 tea- 
cher schools, and 11-19 teacher schools. Also, 
the urban and rural schools group made sig- 








nificantly higher scores than did the mill 
schools group. 


School Grade, Sex, and Race 


Table VIII shows percentiles based on medians 
by grade or group, sex, and race on the Califor- 
nia Test of Personality (grades 1, 2, and 3) and 
on e ith Analysis (grade 4 to 
adults). On the personality test, white pupils 
in each of the primary grades exceeded the 50 
percentile. On Self Adjustment, the percentile 
range was from 55 to 65; on Social Adjustment, 
it was from 75 to 95. Except for grade 2 on 
Self Adjustment, girls made substantially 
better scores than boys on the personality test. 
The superiority of girls over boys was marked 
on Social Adjustment. On Total Adjustment, 
white girls showed a 10 percentile superiority 
over boys in each of the primary grades. 
Colored pupils in grades 1 and 2 made lower 
scores than whites. In grade 3, colored pu- 
pils exceeded the whites by 5 percentile 
points on the test as a whole. Sex differences 
among colored pupils were slightly less mark- 
ed than among whites. In grades 1 and 3, col- 
ored girls showed a 5 percentile superiority 
over colored boys. In grade 2 on Self Adjust- 
ment, colored boys were 5 percentile above 
girls, while in grade 3 on Social Adjustments 
colored boys and girls each reached the 95 per- 
centile. The percentile range for colored pri- 
mary pupils was from 35 to 60 on Self Adjust- 
ment and from 45 to 95 on Social Adjustment. 


On the Mental Health Analysis Test, Liabil- 
ities, Assets, and Total Score, the following 
observations from Table VIII may be made. 
With respect to mental health Liabilities, the 
whites showed definite though uneven gains in 
favorable scores from the fourth grade up to 
college and adult years. The percentile range 
was from 35 in the fourth grade to 73 in the 
college senior group. Small sex differences 
among white pupils on the Liabilities were 
observed. In grades 4, 8, 11, and in the col- 
lege senior and adult groups, the differences 
were 10 or more percentile points with boys 
showing better results in grades 8 and 11 and 
in the adult group. The average percentiles 
of whites for the 13 groups on Liabilities for 
boys and girls were 57 and 56 respectively, 
indicating no real sex differences among 
whites on this part of the test. For the col- 
ored pupils on the Liabilities, no definite 
improvement in scores was made from the 
lower to upper grades. In fact, the second 
highest percentile of 45 was made by the 
fourth grade. Among colored pupils, the av- 
erage percentiles on Liabilities for the 9 groups 
were 38 for boys and 39 for girls, indicating no 
significant sex difference on this part of the 
test. 
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Turning to the mental health Assets, Table 
VIII reveals that all grades and groups except 
the white 11th grade exceeded the 50 percentile. 
The percentile range for whites was from 46 to 
80; for colored pupils, 54 to 80. The most fav- 
orable scores were made by the college groups. 
The average percentile for whites on Assets 
was 65 for boys and 69 for girls, indicating a 
small sex difference in favor of girls. Among 
the colored groups, no sex difference was in- 
dicated, the average percentiles on the assets 
for the nine groups being 68 for boys and 69 
for girls. It is interesting to note that no dif- 
ferences on the Assets were found between 
white and colored pupils when percentiles of 
all grades and groups were averaged. 


On the Mental Health Analysis as a whole, 
Total Score, Table VII Frauke that uneven 
percentile gains were made from the fourth 
grade to the college and adult groups among 
whites. No perceptible gains were made among 
colored pupils from the fourth to the eleventh 
grades. Among whites above the third grade, 
girls showed an average superiority of 8 per- 
centile points over boys. Colored girls showed 
approximately 10 percentile points superiority 
over colored boys. 


Tabulations by age groups from ten to twen- 
ty revealed substantially the same results as 
shown in the Table VIII, Boys and girls at 
age ten reached the 50 and 40 percentiles re- 
spectively; at age twenty both boys and girls 
reached the 70 percentile. Boys at eleven, 
twelve, thirteen, and fourteen made somewhat 
more favorable scores than boys at fifteen, 
sixteen, and seventeen, especially on the Lia- 
bilities. Boys and girls in the college groups 
made higher scores than did younger age 
groups. Approximately five grade levels or 
school years were represented at each age 
level, boys showing more variability than 
girls. 


Ill. Findings and Suggestions 


The chief findings of the Mental Health suey 
of a representative sampling 
uals in the county and city of Spartanburg, 
South Carolina, comprising all age, sex, race, 
social, economic, and educational groups, are 
herewith summarized. 

1. A sampling of 575 primary grade pupils 
(425 white, 150 colored) on the California Test 
of Personality reached or exc e norms 
in every part of the test. The Spartanburg 
total score median was equivalent to the 75 per- 
centile of the group on whom the norms are 
based. In Self-Adjustment, the Spartanburg 
pupils as a group made only slightly better than 
typical scores; while on Social Adjustment the 
Spartanburg pupils reached the 80 percentile. 
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2. A sampling of 2372 individuals (1906 
white, 466 colored), ranging in grade level 
from the fourth through college to adults, on 
the various series of the Mental Health Analy- 
sis, reached the 59 percentile of the group on 
whom the norms are based. As a combined 
group, the Spartanburg sample above the third 
grade reached the 53 percentile on Mental 
Health Liabilities and the 68 percentile on Men- 
tal Health Assets. 





3. From five to ten percent of primary and 
upper grade pupils made very low mental health 
scores and should benefit from special attention 
in the areas of disturbance. Approximately two- 
thirds of those making very low scores were 
boys. The area of emotional conflicts appeared 
to be the chief mental health hazard. Many pri- 
mary grade pupils made low scores on Nervous 
Symptoms and Withdrawing Tendencies. In the 
upper grades, the Emotional Instability sub- 
test accounted for a large proportion of low 
scores. At all grade levels the test data re- 
vealed the need in Spartanburg of helping pu- 
pils resolve their emotional conflicts, relieve 
their anxieties, and develop more emotionally 
secure and independent personalities. 


4. On the Mental Health Analysis Total 
Scores, there were no clearly significant dif- 
ferences in group scores between girls and 
boys of the same ages. Total scores revealed 
a definite but uneven gain for both boys and 
girls from the ten to the twenty year level. 
Pupils of junior high school age and grade lev- 
el showed somewhat more favorable mental 
health test scores than pupils in the upper el- 

ementary or in the senior high school grades. 


5. Primary grade pupils in urban schools 
made significantly higher Self, Social, and 
Total Adjustment scores than did children in 
rural schools, mill schools, or schools with no 
more than six teachers. Young children in mod- 
erately sized schools with seven to ten teachers 
made somewhat better personality scores than 
children in smaller or larger schools. Lowest 
personality scores were made by primary grade 
children in schools with one to three teachers, 
mill schools, and schools with four to six tea- 
chers. 


6. In grades four to eleven, favorable men- 
tal health total scores increased gradually from 
small to large schoois with the most favorable 
scores being made by pupils in schools with 
twenty or more teachers, indicating a definite 
relationship between mental health scores and 
size of school. On the Liabilities sub-tests the 
most favorable scores were made by pupils in 
schools with seven to ten teachers, substanti- 
ating somewhat the same finding among primary 
grade pupils. No differences in scores were 
found between urban and rural school groups on 





the Mental Health Analysis. Interesting differ- 
ences in scores for different type and size 
schools occurred on the sub-tests. For instan- 
ce, on Nervous Manifestations, highest and low- 
est medians were made by pupils in seven to 
ten teacher schools and mill schools respec- 
tively. On Social Participation, highest and 
lowest medians were made by pupils in rural 
and mill schools respectively. On Satisfying 
Work and Recreation, highest median was made 
by urban school pupils; lowest by mill school 


pupils. 


7. Girls showed a general superiority over 
boys in personality and mental health total 
scores at most grade levels. Of the sixteen 
groups, including eleven grade groups, four 
college class groups, and one adult group, girls 
made higher medians than boys and men in 
twelve groups, while boys made slightly higher 
medians than girls in the eighth, ninth, and 
eleventh grades, and in the college freshman 
group. In the first, third, and fourth grades, 
and in the college junior and senior groups, the 
superiority of girls exceeded ten percentile 
points. In none of the four cases in which boys 
exceeded girls was the superiority more than 
three percentile points. Girls in twelve groups 
showed an average superiority over boys of 
eight percentile points. Boys in four groups 
had an average superiority over girls of three 
percentile points. Sex differences among white 
pupils were approximately the same as among 
colored pupils with respect to total scores. 

The findings based on all grades and groups re- 
vealed a small but significant total score sex 
difference in personality and mental health atti- 
tudes in favor of girls. 


8. In the nine out of twelve grades and 
groups, white pupils showed an average super- 
iority of twelve percentile points over colored 
pupils with respect to total test scores. Three 
colored grade groups showed an average super- 
iority of five percentile points over white pupils. 
White pupils made definite gains in medians 
from the fourth grade to the college and adult 
groups. No perceptible gains were made among 
colored pupils from the fourth to the eleventh 
grades. Among whites, girls in the primary 
grades and at the college junior and senior lev- 
el showed a superiority over boys. Among col- 
ored pupils, girls in the fourth, fifth, and sev- 
enth grades showed an average superiority of 
sixteen percentile points over boys. 


9. The most favorable mental health group 
scores were made by college students, total 
score medians reaching the 67, 65, 75, and 75 
percentiles in the freshman, sophomore, junior, 
and senior groups respectively. The junior and 
senior scores were significantly higher than 
freshman and sophomore medians. The average 
total score median percentile for fourth, fifth, 
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and sixth grades was 53; for seventh and eighth 
grades, 59; for ninth, tenth, and eleventh grades, 
52; and for the four college classes, 71. The 
total score adult group median percentile was 
66. Scores for all groups above the third grade 
are comparable since the same test was admin- 
istered; different series of the test were used 
at various grade levels. No sex differences in 
total scores between college freshmen and soph- 
omores were disclosed. Among juniors and 
seniors, medians for girls in each instance ex- 
ceeded medians for boys by eleven percentile 
points, a significant sex difference. College 
girls showed a slightly greater superiority over 
boys on the Assets than on the Liabilities sub- 
tests. College juniors and seniors showed 
clear superiority over freshmen and sopho- 
mores in maturity of behavior, emotional sta- 
bility, feelings of adequacy, and in positive 
attitudes toward work and recreation. 


10. Veterans in college showed a mental 
health total score percentile of 65, six percent- 
ile points below that of the average for all col- 
lege students including veterans. The most 
Javorable scores were made by the veterans on 
the following sub-tests: Physical Defects, Ner- 
vous Manifestations, Close Personal Relation- 
ships, Inter-Personal Skills, Social Participa- 
tion, and Adequate Outlook and Goals. Least 
favorable scores, although still close to the 50 
percentile were made by veterans on Behavior- 
al Immaturity, Emotional Instability, Feelings 
of Inadequacy, and Satisfying Work and Recrea- 
tion. Veterans made higher medians on the 
Assets than on the Liabilities. No significant 
differences in veteran scores among the four 
college classes were noted, only a slight tend- 
ency for junior and senior veterans to exceed 
freshmen and sophomores. 


11. Adults achieved a Total Score percentile 
of 66 on the Mental Health Analysis. With re- 
spect to the Liabilities, the adults Tell below the 
norms with low scores on maturity of behavior, 
emotional stability, and feelings of adequacy. 
On Assets, the adults made high scores, reach- 
ing the 91 percentile. The data suggest that, 
although most adults acquire the requisites for 
good personal and social adjustment, they fre- 
quently show regression in behavior and atti- 
tudes, display emotional conflicts, and suffer 
serious feelings of inferiority. White adult men 
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made significantly more favorable scores on 
Liabilities than white women, but only slightly 
higher scores than women on the Assets sub- 
tests. Whites made significantly more favor - 
able mental health scores than colored adults, 
especially on the Liabilities. Colored adult men 
(17 cases) exceeded colored women on the 
Assets. Adults, no less than school pupils, have 
their difficulties of adjustment and need assis- 
tance in reducing and correcting their mental 
health liabilities. 


12. Negative skewness characterized the 
majority of the distributions on the California 


Zest of Poresaality ond the Acscte past ots 
ysis. The amount of varia- 


tion of scores was less at the upper than at the 
lower grade levels. The higher and more fav- 
orable group scores at the upper levels were 
accompanied by lower standard deviations. On 
certain sub-tests, notably Freedom From Ner- 
vous Symptoms and Emotional Instability, rela- 
tively low scores were made, upper grades 
making no better scores than lower grades, and 
standard deviations showing no reduction in the 
upper grades. 





To foster and promote among young and old a 
sound mental health interest and viewpoint has 
been the primary purpose of this study. This 
interest is essential to life and character and 
education today. Upon it may be based a new 
kind of emotional education which will help to 
prevent children from developing mental illness, 
juvenile delinquency, and serious maladjustment. 
As for teachers, it has long been known that they 
may have dependable competency in subject 
matter, yet be totally lacking in understanding 
boys and girls. The viewpoint here stressed de- 
mands that teachers seek to identify and reduce 
in themselves and in pupils mental health liabil- 
ities and limitations on the one hand, and ampli- 
fy mental health assets and resources on the 
other. At the same time, teachers must increa- 
se respect for themselves and for the personal- 
ities of pupils, recognize their own and the lim- 
itations of others, and come into a better under- 
standing of basic personality needs. It is in this 
realm of mental and moral health that the 
school and community of tomorrow must make 
their greatest contribution. 
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TABLE I 


NUMBER OF INDIVIDUALS IN THE SPARTANBURG MENTAL HEALTH SUR- 
VEY BY GRADE, SEX, AND RACE 
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Grade Whites Negroes Totals _ 
or All All _ All Boys 
Group Boys Girls Total| Boys Girls Total] Boys Girls & Girls 

1 75 62 137 25 31 56 100 93 193 

2 80 78 158 31 30 61 111 108 219 

3 66 €4 130 16 17 33 82 81 163 

4 69 70 139 26 35 61 95 105 200 

5 72 85 157 25 27 52 97 112 209 

6 68 81 129 24 28 52 92 89 181 

7 80 80 160 21 24 45 101 104 205 

8 70 68 138 22 23 45 92 91 183 

9 75 95 170 19 28 47 94 123 217 

10 65 84 149 19 29 48 84 113 197 

11 57 73 130 19 24 43 76 97 173 
College 

Fresh. 114 124 238 114 124 238 
College 

Soph. 93 105 198 93 105 198 
College 

Junior 35 76 111 35 76 111 
College 

Senior 44 49 93 44 49 93 

Adults 36 58 94 17 56 73 53 114 167 

Totals 1099 1232 2331 | 264 352 616 1363 1584 2947 

Totals % 49 51 100 42 58 100 46 54 100 
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TABLE I 


RESULTS OF THE SPARTANBURG SAMPLE OF 575 SCHOOL PUPILS IN 
GRADES ONE, TWO, AND THREE TOGETHER WITH PERCENTILES 
ON THE CALIFORNIA TEST OF PERSONALITY, PRIMARY SER- 














TES, FORM A 
Sub- Tests Range Medians Approx. 8. D. 
Percentiles 

A. Self-reliance 2-8 6.4 70 1.31 
B. Sense of Personal Worth 0-8 7.1 80 1.49 
C. Sense of Personal Freedom 1-8 6.5 70 1.43 
D. Feeling of Belonging 0-8 6.6 60 1.28 
E. Freedom from Withdrawing 

Tendencies 0-8 6.3 60 1.67 
F. Freedom from Nervous 

Symptoms 0-8 5.7 50 1.65 
Self Adjustment 15-49 35.3 55 5.60 
A. Social Standards 1-8 7.8 65 1,19 
B Social Skills 0-8 7.3 75 1.36 
C. Freedom from Anti-Social 

Tendencies 0-8 7.8 70 1,52 
D. Family Relations 1-8 7.6 75 1.36 
E. School Relations 1-8 8.1 90 1.22 
F. Community Relations 1-8 7.7 75 1.31 
Social Adjustment 15-49 42.7 80 5.75 





Total Adjustment 40-99 2 75 10.20 
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TABLE I! 


RESULTS OF THE SPARTANBURG SAMPLE OF 2372 INDIVIDUALS FROM FOURTH 
GRADE TO ADULTS TOGETHER WITH PERCENTILES ON THE VARIOUS SER- 
IES OF THE MENTAL HEALTH ANALYSS 











Approx. 

Sub- Tests Range Median Percentile 8.D. 
A. Behavioral Immaturity 1-20 14.0 52 3.45 
B. Emotional Instability 0-20 11.3 43 4.12 
C. Feelings of Inadequacy 1-20 13.3 52 4.17 
D. Physical Defects 1-20 18.8 82 3.42 
E. Nervous Manifestations 1-20 15.6 58 3.85 

Liabilities 20-99 69.7 53 14.60 
A. Close Personal Relations 4-20 18.9 78 2.52 
B. Inter- Personal Skills 3-20 17.3 17 2.75 
C. Social Participation 2-20 16.6 66 3.10 
D. Satisfying Work & Recre- 

ation 3-20 15.3 56 2.99 
E. Outlook and Goals 2-20 17.8 70 3.85 

Assets 15-100 82.8 88 9.85 





Total Score 70-199 151.6 59 20.00 
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TABLE IV 


MEDIANS AND APPROXIMATE PERCENTILE VALUES OF THE SPARTANBURG SAMPLE OF 
575 PRIMARY PUPILS BY GRADES ON THE CALIFORNIA TEST OF PERSONALITY 























Grade 1 Grade 2 Grade 3 
(N 193) (N 219) (N 163) 

Sub-Tests Med. Percentile Med. Percentile Med. Percentile 
A. Self-Reliance 6.3 70 6.4 70 6.7 70 
B. Sense of Personal Worth 6.9 60 7.1 80 7.1 80 
C. Sense of Personal Freedom 6.5 70 6.6 70 6.4 70 
D. Feeling of Belonging 6.7 60 6.6 60 6.5 60 
E. Freedom from Withdrawing 

Tendencies 6.5 60 6.2 60 6.3 60 
F. Freedom from Nervous 

Symptoms 6.1 60 5.6 50 5.5 50 

Self Adjustment 35.6 55 35.0 55 35.5 55 
A. Social Standards 7.6 65 7.9 65 8.1 90 
B. Social Skills 6.9 50 7.5 75 7.4 75 
C. Freedom from Anti-Social 

Tendencies 7.7 70 7.9 70 8.0 90 
D. Family Relations 7.5 15 7.4 75 8.1 95 
E. School Relations 7.9 70 8.0 90 8.3 90 
F. Community Relations 7.4 75 7.7 75 8.1 95 

Social Adjustment 41.6 75 42.6 80 44.4 oO 








Total Adjustment 75.1 65 77.2 70 77.9 70 

















g 


JOURNAL OF EXPERIMENTAL EDUCATION 





91098 SySSV S-V 


WOOTINO BY STeor eyenbepy q-V 


UOT }JVIIIIY FY YOM BuyAssyyeg q-y 

WOTFEdOIeW [THIOg O-V 
SITFIS [euosied-1ewW] g-V 
Sdyysuojjzeley [euosied e80[D y-V 


sdnoi3 aZe][0o Ul pepnyouy suUeIOA , 





@109§ TROL S-L 





2109§ ANTE T S-1 
SUO}}EISAFULW! SNOAION J-'T 
speyeq Teoys4yd d-'1 
Azenbeptu] jo s3ujjee7 9-1 
ANTQESU] [euC}jOWY gq-7 
Ayanjyeurwl] [e1OAeyag Y-T 























6S $9 99 SL SL s9 L9 zs Ss os 8S 6S 19 8 66 SL 
89 SL 16 8L 08 OL 8L 9 09 09 Lg 89 ZL 09 09 S-V 
LL 9L 6 16 16 06 z6 28 8L ZL 99 SL 08 99 8$  a-V 
19 9S 68 s9 9 Ss 6S £9 8S gS IL 9S IL oS  a-v 
99 z9 z8 99 £9 6S 99 6L IL Lh 69 99 OL LS 19 oO-Vv 
LL £8 £6 06 88 28 L8 29 8S LS ZL SL £8 SL SL 4-vV 
€8 16 06 26 6 26 z6 9L 18 9L 89 8L 8L 69 9$¢  W-¥ 
gs 09 ch ZL OL 09 8S oF 2S cb 9S zs bs ge OF s-1 
8S 89 LS cL IL 9 OL Ss 69 6 OL LS 6S se ee 4a-7 
18 6 06 $6 26 26 £6 Lh bL 09 89 89 8L 8b 66 20S sa-1 
s 8S Le Lg £9 6S 0s Zz Ly 9 69 8S 69 rad 66 20-7 
Sh bb ze z9 09 cb Sb 8S Lb 9¢ ab &b Ss 6£ “e207 
zs 6 se s9 L9 bs os 6b LS 6 Ss £9 6S 6b | ee fot | 
ZLEZ 661 LOL &6 II Q61 eZ CLI LOI LIZ ¢8t SoZ 18 602 002 N 

a 2) 

A = 3 go 6ES ge a2 

J e Se Se So @ It or 6 8 L 9 S t 
: ¢ 38 38 : a 88 
+ " 3 
8 





sdnoiy put sapesy 


SISA‘IVNVY AZNGIDAH ‘IVLINGAW AHL NO 
SdNOUD UAH.LO GNV ‘SUVA ‘IOOHDS ‘SAC VHD Ad A IdWVS OUNAGNVLUVdS AHL NI 
STVNGILAIGNI 216240 (SNVIGGW WOUd GALVWILSa) SAN TVA AWLNAOAd AL VWLXOUddV 


A a 1aVL 

















306 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VI 


MEDIANS AND CORRESPONDING APPROXIMATE PERCENTILES BY TYPE AND SIZE OF 
SCHOOL FOR SELF ADJUSTMENT, SOCIAL ADJUSTMENT, AND TOTAL ADJUSTMENT 
ON CALIFORNIA TEST OF PERSONALITY, PRIMARY SERIES, GRADES ONE, TWO, 
AND THREE, WHITE SCHOOLS 




















Type & Size Self Adjustment Social Adjustment | Total Adjustment 
of School N Med. %-ile Med. %-ile Med. %-ile 
Urban 109 38.5 70 45.5 95 82.3 80 
Rural 148 34.4 50 41.3 75 74.7 60 
Mill 46 34.3 50 40.0 70 74.2 60 
1-3 Teachers 44 34.0 50 38.8 65 72.4 55 
4-6 ” 52 34.7 50 42.1 80 76.5 65 
7-10 33 123 36.9 65 45.2 95 81.1 80 
11-19 ” 102 36.4 60 42.6 80 78.5 70 
20- 103 36.2 60 44.3 90 79,2 75 
TABLE VII 


MEDIANS AND CORRESPONDING APPROXIMATE PERCENTILES BY TYPE AND SIZE OF 
SCHOOL, FOR LIABILITIES, ASSETS, AND WHOLE TEST ON MENTAL HEALTH ANAL- 
YSIS, GRADES FOUR TO ELEVEN 











Type & Size Liabilities Assets Total Score 

of School N Med. %-ile Med. %-ile Med, %-ile 
Urban 263 68.5 50 82.0 65 148.1 53 
Rural 321 68.9 50 82.2 65 148.4 53 
Mill 200 65.5 42 78.9 55 142.6 44 
4-6 Teachers 224 66.7 43 82.1 65 145.3 48 
7-10 ” 394 70.1 55 78.1 53 145.5 48 
11-19 ” 405 67.8 48 81.4 62 147.4 52 
20- ” 153 68.9 50 82.3 65 152.9 60 
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MEDIAN PERCENTILES BY GRADE, SEX, AND RACE FOR SELF ADJUSTMENT, SOCIAL AD- 
JUSTMENT, AND TOTAL ADJUSTMENT ON CALIFORNIA TEST OF PERSONALITY; AND 
FOR LIABILITIES, ASSETS, AND TOTAL SCORE ON MENTAL HEALTH ANALYSS 












































Grade Self-Adjustment 
Whites Colored 
N N %-ile | Z-ile| G-ile N N %-ile | B-ile| %-ile 
Boys Girls Boys Girls | Total Boys Girls | Boys Girls | Total 
1 75 62 55 65 60 25 31 45 50 45 
2 80 78 60 60 60 $1 30 40 35 40 
3 66 64 50 60 55 16 17 50 60 55 
Social Adjustment 
1 715 62 75 85 80 25 31 45 70 65 
2 80 78 80 95 90 31 30 55 65 60 
3 66 64 80 95 85 16 17 95 95 95 
Total Adjustment 
1 75 62 65 75 70 25 31 45 55 55 
2 80 78 70 80 75 31 30 50 50 50 
3 66 64 65 715 70 16 17 70 15 5 
Liabilities 
4 69 70 28 41 35 26 35 40 50 45 
5 72 85 45 39 41 25 27 21 41 31 
6 68 61 62 60 61 24 28 43 38 40 
7 80 80 53 56 54 21 24 32 50 38 
8 70 68 66 55 60 21 24 29 30 30 
a 75 95 48 42 46 19 28 41 28 37 
10 65 84 55 54 54 19 29 49 54 52 
11 57 713 52 39 46 19 24 34 23 28 
Colored 
Fresh. | 114 124 58 58 58 
Colored 
Soph. 93 105 59 61 60 
Colored 
Juniors 35 76 68 70 70 
Colored 
Seniors _ 49 69 85 73 
Adults 36 58 78 65 70 17 56 36 37 37 
Assets 
a 69 70 62 65 63 26 35 42 59 52 
5 712 85 56 59 58 25 27 59 74 68 
6 68 61 71 78 715 24 28 68 17 12 
7 80 80 60 70 65 21 28 78 73 75 
8 70 68 65 69 67 21 24 6G 76 74 
q 715 95 59 60 60 19 28 17 65 69 
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Table VIII (Continued) 
Grade Whites Colored 
N N G-tle } B-ile | G-ile N N %-ile %-ile | %-ile 
Boys} Girls Boys { Girls | Total Boys | Girls | Boys Girls | Total 
10 65 84 53 60 57 19 29 81 79 80 
11 57 73 43 48 46 19 24 58 50 54 
Colored 
Fresh. | 114 124 85 70 78 
Colored 
Soph. 93 105 70 72 71 
Colored 
Juniors 35 76 68 85 80 
Colored 
Seniors 44 49 75 79 17 
Adults 36 58 78 75 76 17 56 76 66 69 
Total Score 
4 69 70 38 48 43 26 35 41 55 51 
5 72 85 48 47 48 25 27 30 49 as 
6 68 61 62 67 64 24 28 45 51 48 
7 80 80 55 64 60 21 24 50 65 57 
8 70 68 65 63 64 21 24 38 40 39 
9 75 95 47 47 47 19 28 50 45 48 
10 65 84 45 52 48 19 29 55 62 58 
11 57 73 56 51 53 19 24 50 47 49 
Colored 
Fresh. | 114 124 67 64 65 
Colored 
Soph. 93 105 64 68 66 
Colored 
Juniors 35 76 66 77 74 
Colored 
Seniors 44 49 69 81 74 
Adults 36 58 75 69 71 17 56 59 58 58 
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THE STABILITY OF MENTAL TEST PERFORMANCE 
BETWEEN TWO AND EIGHTEEN YEARS‘ 


M. P. HONZIK, J. W. MACFARLANE, and L. ALLEN** 


Institute of Child Welfare 
University of California 


In an earlier study, the constancy of mental 
test performance was reported for a group of 
normal children during their preschool years 
(8). These children are now young adults, and 
it is possible to show the relative stability or 
lability of their mental test scores over the en- 
tire period of testing, 21 months to 18 years, in- 
clusive. The contribution of the present study 
lies in the fact of repeated individual tests given 
at specified ages over a 16-year period to more 
than 150 children; and, second, in the fact that 
this group of children was selected so as to be a 
representative sample of the children born in an 
urban community during the late 1920’s. Fur- 
thermore, since the Guidance Study has as its 
primary purpose the study of personality devel- 
opment and associated factors, it has been poss- 
ible to note the relation of fluctuations or sta- 
bility in rate of mental growth to physical ills, 
unusual environmental strains or supports, and 
to evidences of tension or serenity within the 
individual child. 


The Sample 


The Guidance Study has been described in de- 
tail in previous publications (10, 11, 12). Suf- 
fice it to say here that the two groups, which are 
referred to as the Guidance Control Groups, 
constitute representative subsamples of the 
Berkeley Survey. The names of every third , 
child born in Berkeley between January 1, 1928, 
and June 30, 1929, were included in the Berkeley 
Survey (15). A total of 252 children from the 
Berkeley Survey —. were asked to come to 
the Institute for their first mental test at the 
age of 21 months. At this age level, the group 
of 252 children was divided into two matched 
subsamples of 126 children on the basis of socio- 
economic factors (parents’ national derivation, 
income, father’s occupation, socio-economic 
rating, neighborhood, and mother’s age and ed- 
ucation). One of these subsamples (of the Berk- 
eley Survey) has been called the ‘‘Guidance 
Group’’ because of the program of intensive in- 
terviews had with the parents and children; the 
second group, which has had physical examina- 





tions and mental tests but fewer and less inten- 
sive interviews and these at a much later age of 
the child, has been called the ‘‘Control Group’’. 
The children in both groups were given mental 
tests at the age of 21 months. At ages 2 and 24 
years, only the children in the Guidance Group 
were tested. Thereafter, the testing program 
was the same for the two groups. 


Every effort was made to test the children as 
nearly as sible on or near their birthdays. 
Actually, from 72 to 95 percent of the children 
were tested within one month of their birthdates 
on various ages up to and including 8 years 


As was to be expected in a longitudinal study, 
a number of children were unable to come in 
for one or more of the mental tests. The most 
frequent cause of a missed test was the family 
being ‘‘out of town’’. However, a number of 
families lost interest or became uncooperative 
as their children grew older; one child was 
killed in an automobile accident. Tables II and 
Ill show the number of children tested at each 

e level. It will be seen that at 18 years 153 

the 252 children were tested on the Wechsler- 
Bellevue. The reasons that the remaining 99 
did not come in for a test are listed in the fol- 
lowing table: 


Guidance Control 
n n 
**Out of town’’ 24 26 
Uncooperative 17 16 
Died -- 1 
Case closed early, 
cause unknown -- 6 


Missed 18-year test 
(due to changes in 





*The data of this investigation are those of the Guidance Study, which is being carried on at 


the University of California Institute of Child Welfare under the direction of Dr. 


Macfarlane 


Jean W. 


**Over 2500 individual mental tests were given the 252 children oberved; of this number, over 
1500, or 60 percent, were given by Lucile Allen, who also assembled certain of the case material. 


The case summaries were written by J. W. Macfarlane. 
the analysis of the data and certain of the conclusions drawn. 


M. P. Honzik is largely responsible for 
Acknowledgement is also due to 


the many statisticians who have helped in the data compilation, J. Delaney, M. Snyder, E. Laws, 
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L. Klein, and H. H. Hoffman. 
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staff, iliness, or 
transportation 
difficulties) 5 a 


The reasons for lack of cooperation on the 
part of the parents and children are many and 
ried. Three children were embarrassed by 
cal examinations. One father objected 
to his daughter having a physical examination. 
Two children objected to taking mental tests 
re 101 and 110, ae at 
15 years). One un 


rative family was in 
not want to discuss 
rough method of evaluating the 


at 18 years: 
ofParests "le years at 10 years 
% % 
College 40 33 
High School 40 46 
Grammar 
School ae ait 


This comparison shows that more of the par- 
ents of children who were tested at 18 years 
were college trained; fewer had a high school 
education, and the same proportion had a gram- 
mar school education, as was true of the up 
as awhole. These differences may m but 
do not invalidate our conclusions. 


The Testing Program 


The testing program followed in the Guidance 
Study is summarized in the following table: 





Ages Test 


21 months - 5 years California Préschool 
Schedule I or II 


6 and 7 years Stanford-Binet, 1916 
Revision 

8 years Stanford Revision, 
Form L 

9 - 15 years Stanford Revision 
(either Form L or M; 
see table I) 

18 years Wechsler-Bellevue 





During the preschool years, 21 months to 5 
years, inclusive, each child was tested at suc- 
cessive age levels on the same test, either the 
California Preschool Schedule I or California 
Preschool Schedule II.4 Beginning at age 9, 

a “~~ of test alternation was begun which 
was igned to show the effects of a change in 
the form of the test on mental test constancy 
(see Table I). As may be seen in this table, all 
the children in both groups were tested on 
either Form L or Form M of the Stanford Re- 
vision at age 9 years. But at ages 12 and 14 
years, only two-thirds of the groups were given 
mental tests; the remaining third of the groups 
was tested at ages 13 and 15 years. In pre- 
senting group results, the scores for ages 12 
and 13 years have been considered together, as 
have scores for ages 14 and 15 years. 


The I.Q.’s obtained on the Stanford tests and 
the Wechsler-Bellevue were converted into 
sigma or standard scores so that they would be 
in comparable form to the mental test sigma 
scores obtained between 21 months and 5 years. 
The means and standard deviations of the 1.Q.’s 
which were used in computing the sigma scores 
are given in Table II. The mean I.Q.’s for the 
combined Guidance and Control Groups shown in 
the last columns of the table were the ones used 
in computing sigma scores for individual child- 
ren. 


Table II also shows that, although these child- 
ren were selected as a representative sample of 
urban children, their scores are considerably 
above the test norms. The average I.Q. on the 
Stanford-Binet at ages 6 and 7 years and on the 
Stanford Revision, Form L at 8 years varied 
from 118.3 to 118.7. During the age period 9 to 
13 baw the average I.Q. was approximately 
120. The highest average I.Q. 23 was ob- 
tained for the test period 14 and 15 years; and 
the lowest I.Q. average (118.2) was earned on 
the Wechsler-Bellevue at 18 years. 


The distribution of the I.Q.’s may be seen in 
Figure 1 for age periods 6 through 18 years. 
These percentage distributions of 1.Q.’s are 
relatively normal at all ages at which the Stan- 
ford-Binet or Form L or M of the Stanford Re- 
vision were the tests given. But at 18 years, 
the distribution of 1.Q.’s on the Wechsler-Belle- 
vue suggests that this test lacks ‘‘top’’ or at 
least does not differentiate between the child- 
ren egrning the highest scores at the earlier 
ages.“ Bayley (1) has another explanation for 
the decreased variability at maturity. She sug- 
gests that variability is greatest during the age 
—- when the children are acquiring the 
‘unctions being tested and that variability be- 
comes restricted with the approach to maturity 
of the particular processes being measured. 





1. The published California Preschool Scale Form A (9) is composed largely of items from the 
California Preschool Schedule I, together with a few items from the California Preschool Schedule 
II. The test items for the California Schedules I and II include selections made by Dr. Adele S. 
Jeffa from several standardized tests, together with some original items first validated at the 
Institute. These scales have been normed by tho Thurstone method of absolute scaling. 


2. J. H. Ranzoni and R. D. Tuddenham are preparing a more detailed svaluation of the scores 
earned by these children on the Wechsler-Bellevue in contrast to their scores on earlier tests. 
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Group Trends in Mental Test Stability 


Pearsonian coefficients of correlation between 
test scores earned at specified ages between 21 
months and 18 years are shown in Table ITI. 
These correlation coefficients are based on the 
scores of the children in the combined Guidance 
and Control Groups for ail but two age levels 

2 and 24 years) when only the children in the 
uidance Group were tested. 





Correlations for adjacent ages indicate a fair 
ree of mental test constancy when the inter- 

val between tests is ata minimum. The range 
of correlations for adjacent ages varies from 
r= .71 (between ages 21 mo x 2 years; 2x 
2} years; 3 x 35 years; and 5 x 6 years) to r « 
-92 for the ages 12 x 14 years on the Stanford 
Revision, Form L. However, the correlations 
decrease markedly with the interval between 
tests but tend to increase with the age of the 
children when tested. 


Comparison of the correlation coefficients for 
three-year intervals shows clearly the increase 
in mental test constancy with age: 


2x5 years r= .32+ .06 
3x6 years r= 57+ .05 
4x7 years r= 59+ .04 
5x8 years r=.70 
7x10 years r= .78 
9x12 or 13 years r= .85 
4or 
15 x 18 years r=.79 


The correlation between tests given at 2 and at 

5 years (r = +.32) suggests a prediction which is 
not much er than chance, the magnitude 
of the test-retest correlation increases marked- 
ly with age. 


The importance of both age and interval be- 
tween tests on the test-retest correlation is 
shown by the relation of these r’s (of Table III) 
to the age ratio (age at first test/age at second 
test), = .85 (8). 


The relation of test scores earned at four 
specified ages (21 months, 3 years, 6 years, 
and 18 years) to test scores earned at all 
other ages may be seen in Chart 1. In the up- 
per left quadrant of this Chart is shown the 
correlation of the 21 month test with scores at 
later age levels. We note a marked decrease 
wth age, copectally dartag tae prosehen! ® 
with age, especially e pres years. 
However, the pocnen Kenny band» the 21-month 
and 2-year test (r = .71) indicates that the first 
test given the children at the age of 21 months 
was fairly reliable. . 


The results of the upper right quadrant, show- 
ing the correlation of the 3-year mental test 
scores with scores at other ages, should be 
compared with those of the upper left quadrant. 
The correlations of the 3-year test with scores 
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at the adjacent ages 2} and 34 years are fairly 
high (r’s are .71.and .73) but decrease to 
—" indicate poor prediction by 9. years 
r= .43). 


Since Stanford tests are frequently given to 
children in the first grade, the results given in 
the lower left quadrant should be of interest to 
educators. The 6-year I.Q.’s are fairly con- 
stant, but the correlation coefficients are not 
sufficiently high so that the possibility of mark- 
ed changes in the I.Q.’s of individual children 
is precluded. 


The fourth quadrant of this chart shows the 
increasing prediction with age of success on 
the Wechsler-Bellevue at 18 years. The 
writers are concerned that the correlations 
with the Wechsler-Bellevue are not higher. 
Restricted variability, regardless of its cause, 
is probably a contributing factor. Another 
factor may be the differences in the types of 
test items included in the Stanford and Wechs- 
ler-Bellevue tests. 


Effect of Change of Form of Test on Mental 
cy 


Comparisons between correlations earned on 
the same and different forms of the Stanford 
Revision over specified age periods are shown 
in Table IV. 


The correlation between the 8- and 9-year 
tests for children tested on the same form of 
the Stanford (Form L) is .91; but the correla- 
tion is even higher for the remainder of the 
group who were tested on Form L at 8 years 
and Form M at 9 years (r = .93). Comparison 
of the effect of change of form on the test- 
retest correlations is made for six age per- 
iods. In all of these comparisons, the differ- 
ence between the test-retest coefficients, when 
the same or different forms of the Stanford 
test were used, was negligible. Bayley obtain- 
= ay (ie r results in the Berkeley Growth 

tur om 


Changes in Scores Over Certain Age 
3 


The correlation coefficients in Table III in- 
dicate the group trends with respect to the con- 
stancy of mental test performance. It is also 
of interest to know the extent of the changes in 
sigma scores or I.Q. which are occurring in 
individual children. Furthermore, the question 
arises as to whether the correlation between 
mental test scores is largely determined by a 
relatively small proportion of the cases or M4 
the group as a whole. Bs pee aay © . 
we published the distribution of changes 
sigma scores which occurred between the 6- 
and 7-year tests (r = .82)for these children. 
This distribution was normal, with 80 percent 
of the group showing sigma score changes of 
-5 or less. However, there were six children 





deviation for ages 6 and 7 years 
is approximately 13) or more. The average 

change in score between 6 and 7 years was .5 
of a sigma (6.5 I.Q. points). 
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If changes in I.Q. of 20 points can occur be- 
tween the 6- and 7-year tests, it would be 
reasonable to expect rather marked changes 
in scores over the entire test period, 21 
months to 18 years. We have, therefore, pre- 
pared distributions of the 3 of sigma 
score cha for the entire 16-year period 
of testing. We find that the scores of three 
children have increased between 4 and 4} 
sigma (roughly between 70 and 79 pay 

tion 


assuming ana ximate standard de 
of 17.5 1.Q. points); and the scores of two 
children have decreased a similar amount. 


The sigma score curves for four of these five 
children are depicted in Figure 2D and Figure 
2E. The most interesting aspect of these tre- 
mendous changes in scores is the fact that 

the changes are not made abruptly but consis 
tently over a long period of time. However, 
the greatest changesdo occur on the preschool 
tests. We have, therefore, prepared distri- 
butions showing the range in sigma 
scores and I.Q.’s between 6 and 18 years. No 
child’s sigma scorechangesas much as 4 sig- 
ma during the school rs. But the scores of 
one child (case 764, Figure 2D) changes 3 sig- 
ma; and those of four others between 2.5 and 


2.9 sigma. 


Since educators and clinical workers use 
1.Q.’s rather than standard scores, we have 
red a distribution of the _ of changes 
fn 1.Q. during the 12-year period 6 to 18 years 
for the two groups, Guidance and Control: 


1.Q. Changes Be- Guidance Control Total 
tween 6and 18 years m=114 n=108 n=222 
% % % 
50 or more I.Q. pts 1 - 5 
30 or more I.Q. 9 10 g 


pts 
20 or more I.Q. pts 
15 or more I.Q. pts 
10 or more I.Q. pts 

9or less I.Q. pts 


The mental test curve of the boy in the Guid- 
ance Group whose I.Q. varied by more than 50 
ints of 1.Q. is described in Figure 4 (case 
$67). We are impressed not only by the extent 
of the changes in I.Q. during the s l years 
but also by the fact that the results are so sim- 
ilar for the two groups, Guidance and Control. 

This f suggests the reliability of these 
figures that they would probably be dupli- 
cated under similar conditions of testing. 
Changes in I.Q. of 30 or more points of I.Q. 
are shown by 9 percent of the children in the 
Guidance Group and 10 percent in the Control 
Group. The I.Q.’s of over half of the children 
showed a variation of 15 or more points of I.Q. 
at some time during the school years, anda 
aia. the group varied as much as 20 points 


Although it is extremely important to point 
out the possibility of marked changes in scores 
in individual cases, it is equally important to 
emphasize that the scores of ey Apne 
change only slightly with respect to the group 


from one age period to the next. And it is only 





‘when the a are consistently in one direc- 
tion, or the Fr, over a period of years that 
the range of variation becomes as great as 3 
or 4 sigma (or over 50 I.Q. points), 


Stability and Testa in the Mental 
cores Vv n 


Mental test s score curves have been 
drawn for all of the children in the Guidance 
Study. In this sample of 252 children, we have 
found individuals whose mental test scores 
have remained relatively stable at either a 
high, average, or low level over the entire per- 
iod of testing (21 months to 19 years). Other 
children have shown highly inconsistent scores 
in their mental test performance. Examples of 
va degrees of constancy of mental test 
performance are shown in Figures 2, 3, and 4. 


Consistently Low Scores (Figure 2A) 


Case 504, a girl, had a mother who attended 
college two years and a father who graduated 
from college. The child was defective at 
birth, which showed up not only in tests (her 
1.Q.’s varied from 67 to 53, s scores 
from -3 to -44$) but in her whole develop- 
mental history. Clinical diagnosis was mi- 
crocephaly, probably secondary to prenatal 
injury early in M.’s pregnancy. 


Case 945, a boy. This child’s scores are 
nearly all between -1 and -2 sigma, but he 
has one I.Q. as high as 110 and one as low 
as 87. He comes from a minority group of 
relatively low economic status. His mother 
reached the tenth grade; his father finished 
high school. Additionally, his life has been 
characterized by the chronic emotional and 
economic strain and the divorce of his par- 
em d | poor health of both parents 
himself, by inadequate supervision, and ep- 
isodes of acute which showed them- 
selves in severe psy matic disturbances. 


onsistently Ave Test Scores (Fig- 
Contents AUD . 











Case 783, a boy earning consistently aver- 
age scores, presents the least variability 
in our with respect to test scores 
(1.Q. 1f0 to 125, after the preschool years). 
What factors lead to this stability of per- 
formance? His health history shows, as an 
infant, impetigo, severe bronchitis, a crit- 
ical thymus bance, and chickenpox. 
As a preschool child, he had frequent colds, 
obstructive adenoids, infected tonsils (re- 
moved at 7). During his elementary school 
years, he had a chronic nasal discharge, a 
systolic murmur. His adolescent yea 


mumps, measles, and scarlet fever. But in 
spite of this history, he was energetic when 
well and interested in athletics. 


The family situation was markedly sub-stan- 


dard, E people, sometimes nine or ten, 
lived in four rooms, the boy sleeping in his 


parents’ room until 11. The father was irreg- 


ularly employed, insecure, drank too much, 


rs show- 
ed poor dental hygiene, acne. At age 18 he had 








JOURNAL OF EXPERIMENTAL EDUCATION 


and when drinking, got into trouble (fights and 
women). The mother was mature and steady 
but had a hernia, chronic endocarditis, and 
worked away from home and, frequently, work- 
ed far beyond her — The mother did 
not finish high school; the father had two years 
of college. © sibs of the boy attended the 
University; one graduated. 


The boy was a bed-wetter until 18, had an 
acute stammer from 9 to 12, was and is a 
chronic nailbiter. His school grades were 
considerably below his tested 1.Q. He was 
ashamed of his home, his father, and since he 
had strong social affiliative interests (which 
his home precluded) and athletic interests 
(which his health and small stature interfered 
with), there was never a time in his history 
— he was not confronted with extreme frus- 


The question is again asked: Why his stable 
test scores, since other children with less 


disturbing histories showed such and 
downs? question is raised , unfor- 
tunately, not answered. One is is 


hypothes 
that because of chronic strain, internal and 
external, he persistently functioned on tests 
below his potential and at a level which he 
could do without effort.. Did he fail to vary, 
as other disturbed children, because he was 
never free of tension so that, unlike them, he 
never had the chance to have a high score and 
show his real potential? 


Case 976, showing consistently average and 
fairly stable test scores, is a girl of immi- 

parents whose mother had schooling 
until 16 and whose father had schooling until 
age 12. The father provided a very tan- 
tial income and status to his family with his 
prestige-giving business success. Her 
mother, lonely in a new country, over-pro- 
tected her daughter, especially in her eariy 
years. The mother slept in her daughter’s 
room 


until she was 9. girl’s two high- 
est scores (1.Q. 137) occurred at age 9 
(following tonsille 1 r- 


period of more freedom from home, 
more social rtunity and success). Her 
lowest score r her preschool 


Relattvels Consistent High Scores 


Case 423 presents a high-scoring girl whose 
r is a norma! school and 
whose father obtained a graduate uni- 
versity degree. Her h test score was 
obtained at 6. Her scores continued high, 
but sagged during late adolescence. She is 
attractive, artistically talented, and socially 
successful. She got very high grades in the 
chool years. Her 


fod of fatigue and poor posture and a period 
during which she strained to excel, and at 

18 where her sigma score showed a drop. 
Ber high school years were characterized by 
much less interest in intellectual success, 





313 


which she regarded as unfeminine and inter- 
fering with getting good dates; and her moti- 
vation in all test situations was markedly be- 
low that of her early years. Scholastic med- 
focrity was consciously sought and obtained, 
se not only her date objective, but her 
emancipatory revolt against parents who 

ced a very high value on grades and a very 

ow value on her boy friends. 


Case 524 is the daughter of parents who both 
had graduate university tra and who pre- 
sented their children with a self-contained 
and a rich family-centered recreational 

gram but with little outside social life. 

girl showed ccntinuously high scores, although 
there were frequent of as much as .8 of 
a sigma over two-year periods and 1.Q.vari- 
ations from a lew of 137 to a high of 162. Her 
most outstanding handicaps were tongue tie, 
corrected early; strabismus corrected at 114; 
and more or less chronic reserve. Socio- 
metric findings appraised her as being the 
most quiet and reserved in her class. Her 
main interests were less in persons than in 
reading and . Her highest score was 
obtained at 13, after menarche and at a 
time when she was sae bas aie school 
work tly a her education-ori- 


mily, including aunts, uncles, anda 
who had active educators). 
Social participation, which was always a 


strain for her, was particularly restricted 
at this time. 


Steadily Decreas Mental Test 
cores e 

Case 935 is a girl, an only child, who did 
well on at 21 months (1.Q. 120) but lost 
1.Q. points steadily at the rate of about 5 or 
6 points per year, reaching an I.Q. of 64 
(Wechsler-Bellevue) at 18 years. This isa 
decrease of 44 sigma. School reports after 
the second or third grade showed consistent 
failures. Intensive physical tests, including 
en ograms, disclosed no discoverable 
physical factors or disease processes. The 
estimated I.Q.’s of the parents are around 

80 to 85, grammar school education. The 
mother reports that she, as a girl, was ‘‘held 
over’’ several times. Chronic emotional and 
economic strain characterized the home after 
the early preschool years. Estimates of in- 
telligence on the Rorschach and Thematic 
Apperception Test indicated average intelli- 

ence or only slightly below average. Her 

y life is apparently richer than her 

overt test performance or school perform- 
ance. The question is unsettled as to whether 
this is a case of mental deficiency or possible 
hebephre 





Case 764 is an example of a dual lower- 
ing of T.Q. from 133 to 77, sigma scores 
from +1 to -3. She is an only child, born 
when the mother was 44, the father, 37. The 
estimated I.Q. of the mother is 65 to 70. The 
father is a skilled mechanic. The parents 
went to school until age 14. 


Obesity began in late preschool years and 
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increased steadily until medical advice was 
finally followed at age 14 (height 5 ft. 2 in., 
weight 160 lbs. at 13). Weight was normal 

at 17. There were, however, no I.Q. varia- 
tions in relation to these physical changes. 
She was always over-indulged by the mother, 
who lived to feed her and keep her young, and 
who was always complaining that her daughter 
never gave her enough affection. 


Cases with Increasing Scores (Figure 2E) 


Case 553 is a boy whose mental test scores 
increased from a preschool sigma score of 
-2 to later sigma scores of +2.4 in spite of a 
bad physical history. He is small-statured, 
thin, with very poor musculature, and pre- 
sents a history of early ear infections and 
chronic bronchitis from infancy---headaches 
(early glasses), stomach pains (appendectomy); 
he has had three operations and three serious 
accidents. He has had only one six-month 

riod in his life free of illness. In spite of a 
rail frame, which has suffered many serious 

ities, an early strained family situation, 

and relatively low mental test scores in his 
early preschool years, his tested ability 
steadily increased until 9, from which time 
he has maintained high and fairly stable scores. 
His mother is a normal school graduate; his 
father completed high school. His greatest 
security lies in his intellectual interests and 
achievements, but he has made good social 
adjustments and an amazingly good adjust- 
ment to his handicaps. 


Case 567. The early preschool history of this 

r e period of her lowest test scores) was 
characterized by the critical illnesses of her 
mother and brother and the emotional and 
financial strain that these entailed. Further, 
the girl had very poor muscle tonus, fatigue 
posture, and was very shy and reserved. At 
64 years she had pneumonia. From 10 on, 
while still reserved, she had many supports 
in her life---music, athletic success, summer 
camps, the honor roll at school. Eighteen 
years marks her first year in college and 
away from home and her first really com- 
pletely satisfying social life, which resulted 
in great expansiveness. Both parents are 
college graduates who did advanced work in 
their fields. 





Highly Variable Scores (Figure 2F) 


Case 715, a girl whose I.Q. varies from 121 
to 165, presents a history that was character- 
ized by intermittent but severe eczema and 
asthma throughout the entire testing period. 
Ages 3 to 7 constituted a particularly bad per- 
fod; age 9, where there was a in 1.Q. of 
12 points, was a period not only of asthma and 
poor vision but of acute economic insecurity 
and family uneasiness. At age 10, during her 
highest test period, she was taking two cc. of 
adrenalin daily to keep her asthma under con- 


Added to health strain, social strain be- 
came more acute for her at 12 when she 
entered junior high school and continued 
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weight, disfiguring eczema, and marked 
mother-daughter strain. Not only did the 
girl belong to a racial minority group, but 
she was not at one with them because of her 
marked intellectual interests, unshared by 
others in her racial age group, at school or 
in the neighborhood. Her mother is a high 
school graduate; her father had three years 
in college. 


At the time of the 18-year test she was very 
much below par as she was recovering from 
an acute period of asthma. 


Case 946, a girl, has varied in I.Q. between 
T4Z (preschool) and 87. Her sigma scores 
have varied from +1} to -2; her preschool 
a were clearly higher than later years. 

er lowest score (I.Q. 87, sigma -2) occur- 
red at 9 years, a period of acute body concern 
and excessive modesty. Immigrant parents 
of 7 school cation, both unstable 
and involved in chronic, acute marital tension, 
were divorced when the girl was 7. This child 
was acutely uneasy around her young step- 
father for the first years of her mother’s new 
marriage. Much internal as well as external 
turmoil has characterized her life. 


In selecting mental test curves to include in 
this study, the writers were impressed by the 
fact that the children whose scores showed 
the greatest fluctuations were children whose 
life experiences had also fluctuated between 
disturbing and satisfying periods. Two such 
cases are 715 and 946 in e2F. Two 
more cases whose scores s marked in- 
stability are presented in Figures 3 and 4. 


There is only one further generalization that 
seems justifiable on the basis of mental test 
curves of only 14 children, and that is the fact 
that the records of all eight children showing 
consistent trends have final mental test scores 
which are similar to their parents’ ability, as 
judged by their education, and socio-economic 
rating. Of the four children whose scores 
either decreased markedly or were consistent- 
ly below the average for the group, three child- 
ren were from homes which were low in the 
socio-economic scale and had parents with less 
education than the average for this Berkeley 
sample. One common factor in the decreasing 
scores of cases 935 and 764 may be the stimu- 
lating effect of affectionate parents on their 
only children in the early years. These par- 
ents with less than average ability could not 
continue to offer intellectually stimulating 
environments as their children grew older, 
nor can the hereditary factors with which they 
were endowed be discounted. The micro- 
cephalic youngster with low scores was the 
child of parents with above average intelligence. 
This case should probably be considered 
result of an intra-uterine disturbance unrelated 
oo a or post-natal environmental 

rs. 


On the other hand, the four children with 
increasing or consistently high scores had 
rents with more than average education 





through high school. It was a period of over- 


seven of the eight parents were college grad- 
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uates). Superior hereditary and environ- 
mental factors were unquestionably contrib- 
uting to the mental test records of these 
children.3 


Summary and Conclusions 





A group of 252 children, who comprise a 
representative sample of the children living 
in an urban community, were given mental 
tests at specified ages between 21 months and 
18 years. These data have been analyzed to 

show the extent of the stability of mental test 
performance for this age period. The results 
may be summarized as follows: 


1. Mental test constancy for the age 
period 21 months to 18 years is 
markedly > upon the age at 
testing and interval between tests. 
That is, group prediction is good over 
short age periods, and mental test 
scores become increasingly predictive 
after the preschool years. 


2. Test-retest correlations are as high for 
children tested on different forms (L 
or M) of the 1937 Stanford Revision as 
for children tested on the same form 
over the same age periods. 


Distributions of the extent of the changes 
in 1.Q. for the age period 6 to 18 years 
show that the I.Q.’s of almost 60 per- 
cent of the group change 15 or more 
points; the I.Q.’s of a third of the group 
change 20 or more points; and the I.Q.’s 
of 9 percent of the group change 30 or 
more points. The I.Q.’s of 15 percent 
of the group change less than 10 points 
of I.Q. The group averages, on the 
other hand, show a maximum shift in 
> a this age period of from 118 

to 123. 





4. Some individuals show consistent up- 
— or ne gested ony in 1.Q. — 
a long per resulting in changes 
as much as 43 sigma or 50 I.Q. points. 


5. Inspection of the mental test curves of 
the individual children included in this 
paper indicates that changes in mental 
test scores tend to be in the direction 
of the family level, as judged by the 
parents’ education and socio-economic 
status. (Group findings showing an in- 
creasing relationship of family status 
to the children’s test scores were pre- 
sented in an earlier study (6).) 


6. Children whose mental test scores 
showed the most marked fluctuations 
had life histories which showed unusual 
variations with respect to disturbing 
and stabilizing factors. However, there 
were other children whose scores re- 
mained constant despite highly disturb- 


ing experiences. 


In conclusion, it should be re-emphasized 
that, whereas the results for the group suggest 
mental test stability between 6 and 18 years, 
the observed fluctuations in the scores of in- 
dividual children indicate the need for the ut- 
most caution in the predictive use of a single 
test score, or even two such scores. This 
finding seems of especial importance sinee 
many plans for individual children are made 
by schools, juvenile courts, and mental hy- 
gience clinics on the basis of a single mental 
test score. Specifically, it could be noted 
that a prediction based on a 6-year test would 
be wrong to the extent of 20 I.Q. points for one 
out of three children by the age of 18 years, 
and to the extent of 15 I.Q. points for approx- 
imately six out of ten children. 
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Figure! 
FREQUENCY DISTRIBUTIONS OF iQ'S AT DIFFERENT AGES 
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MENTAL TEST PERFORMANCE OF CASE 802 (BOY) 
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TEST ALTERNATION PLAN IN THE GUIDANCE STUDY 


TABLE I 
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Tests Given to Children Whose Birthdates Occur in: 




















Age Level 
in Years Group Jan.-June, 1928 July-Dec., 1928 Jan.-June, 1929 
a Guidance Stanf. Form L Stanf. Form M Stanf. Form M 
Control Stanf. Form L Stanf. Form L Stanf. Form M 
10 Guidance Stanf. Form M Stanf. Form L Stanf. Form L 
Control Stanf. Form M Stanf. Form M Stanf. Form L 
12 Guidance -- Stanf. Form L Stanf. Form M 
Control -- Stanf. Form L Stanf. Form L 
13 Guidance Stanf. Form M -- ied 
Control Stanf. Form L -- -- 
14 Guidance -- Stanf. Form M Stanf. Form L 
Control -- Stanf. Form L Stanf. Form M 
15 Guidance Stanf. Form M -- -- 
Control Stanf. Form M -- -- 
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EFFECT OF CHANGE OF TEST ON THE STABILITY OF MENTAL TEST SCORES 














Age Levels Compared Tests Given n r 
8 x 9 years Form L x Form L 87 91 
Form L x Form M 100 93 
8 x 10 years Form L x Form L 105 88 
Form L x Form M 83 88 

8 x 12 or 13 years Form L x Form L 117 -85 
Form L x Form M 64 -82 

9 x 12 or 13 years Form L x Form L 49 -90 
Form M x Form M 32 -79 

Form L x Form M or Form M x L 101 .89 

10 x 12 or 13 years Form L x Form L 70 87 
Form M x Form M 38 91 

Form L x Form M or Form Mx L 48 89 

12 or 13 years x 14 or 15 yrs. Form M x Form M 33 -88 
Form L x Form M or Form Mx L 114 -89 

Average r when same form of test repeated 87 

Average r when different form of test given 88 
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A REVIEW OF PROCEDURES FOR THE EVAL- 
UATION OF THE OUTCOMES OF 
TEACHING ENGLISH 


ROBERT M. W. TRAVERS 
University of Michigan 


Introduction 


The purpose of this paper is to review tech- 
niques for evaluating the outcomes of the teach- 
ing of English primarily at the high-school and 
college freshman level. While many measur- 
ing techniques have been developed in this area 
during the last thirty years, they have often 
been published in journals that are not widely 
available and in at least one notable case (27) 
have remained unpublished. The following 
pages review these new evaluation techniques 
and discuss some of their merits and limita- 
tions. 


The Evaluation of Outcomes Related to 
Expression 


For the purpose of this review of evaluation 
procedures, it is convenient to think of two 
kinds of outcomes of the teaching of English: 
outcomes related to the verbal expression of 
ideas, and outcomes related to the understand- 
ing of spoken and written language. 


The measurement of the student’s ability to 
express ideas in various forms has usually 
formed the basis of evaluating student progress 
in the language arts, and for generations, the 
essay or theme has been the English teacher’s 
stock in trade for measuring this outcome. 


The objectives which teachers attempt to 
achieve or to measure through theme writing 
are mumerous and varied. They include the 
development of such abilities as the following: 


The ability to write grammatically correct 
English. 


The ability to organize ideas. 
The ability to express interesting ideas. 





The ability to transmit concepts from one 
person to another by means of accurate 
verbal description. 


The ability to express ideas in a form 
which is both clear and interesting. 
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The ability to produce an aesthetically 
worthwhile composition. 


The ability to spell correctly. 
The ability to punctuate and paragraph. 


All of these abilities and many others, tt is 
claimed, can be measured adequately in a stu- 
dent’s theme. The traditional method of ap- 
praising a student’s composition is for the 
teacher to make an over-all evaluation of it on 
the basis of general impression. By this meth- 
od the teacher is appraising the composition in 
terms of his own general concept of what a de- 
sirable composition should be. Sometimes the 
teacher’s concept of a good composition is fairly 
clear and can be made explicit, but often it has 
been formulated only in general terms. The ev- 
idence indicates that over-all impressions are 
usually based on rather superficial qualities of 
the composition. If an over-all appraisal of a 
composition is to have definite meaning, it is 
necessary to define the factors which enter into 
the total evaluation and the weight to be given to 
each factor. For example, if the teacher is to 
evaluate a composition in terms of the factors 
listed above, it would be necessary to determine 
just what weight is to be attached to each of 
these factors. This is a way of stating that all 
teaching is aimed at a number of objectives 
but that some objectives are stressed more 
than others, and that an evaluation instrument 
should give the greatest emphasis to those ob- 
jectives that receive the greatest stress. For 
example, Hartog (15) defines seven things that 
should be measured in a composition. These 
seven are sense, spelling, punctuation and 
paragraphing, grammar and syntax, accuracy 
and vocabulary, power of expression, and the 
general impression given by the total compo- 
sition. He recommends measuring each of 
these items separately before combining them 
into a total, and he believes that the examiner 
should determine what weights are to be assign- 
ed to each category. The various composition 
rating scales which have been developed (3, 16, 
18, 19, 20, 21, 32, 33, 35, 37) are all based on 
this kind of concept. They all attempt to break 
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down a composition into various measurable 
components and to give a weight to each of 
these components. While analytic procedures 
do have some merit in that they force the 
teacher to make an analysis of what is being 
measured, composition rating scales have not 
had very widespread usage mainly for the 
reason that they have stressed analysis to the 
point where the scoring of a composition re- 
quires more time than the teacher can profit- 
ably devote to it. Consequently, in the last two 
decades there has been a tendency for many 
writers, including Stalnaker (28, 29, 30), to 
recommend a compromise between the detailed 
analytic rating-scale method and the over-all 
general impression method. In this compro- 
mise method the teacher rates the composition 
on the basis of over-all impression but defines 
the main things that are to be kept in mind in 
forming this impression. 


It should be noted that some of the skills in- 
volved in written compositions cannot be evalu- 
ated at all. There seems no possible way by 
which one can measure the extent to which an 
individual is able to express his own ideas 
clearly since apparent inadequacy of expression 
may be either a symptom of lack of ideas or a 
symptom of the inability to express ideas. This 
situation is also complicated by the fact that 
little is known yet concerning the extent to which 
attempts at expression may serve the purpose of 
forming ideas. The expressive act is, in itself, 
part of the thinking process. 


Up to this point, only the teacher’s evaluations 
of a student’s product have been considered. 
There are, however, judgments other than 
those of the teacher which may be given con- 
sideration in evaluating a student’s literary 
efforts. One writer (15), in discussing this 
question, points out that the usual formula for 
developing writing skills is the formula of writ- 
ing ‘‘anything about something for anybody’’ in 
which only the something is specified. When 
this procedure is used as a teaching device it 
becomes very difficult to appraise the student’s 
accomplishment because neither the student nor 
the teacher knows precisely what is to be 
accomplished. People outside of school never 
have to write ‘‘anything about something for 
anybody,’’ and most adults would find such a 
task difficult, if not an impossible one. In life 
outside of school, people write for a particular 
audience in view, and most of the things they 
write are rather short. In the world outside of 
the schoolroom, the success of a written com- 
position is judged, not always in terms of the 
kinds of judgments made by teachers about the 
composition of students, but in terms of whether 
the composition achieved a particular objective. 


This is an important point for it stresses the 
fact that expression can be evaluated adequate- 





ly only in terms of the purpose which it serves. 
There has been a marked tendency for certain 
writers on the subject of the teaching of English 
to stress this fact. Hartog (15), for example, 
takes the view of the great Dr. Johnson that the 
usual form of essay is necessarily ‘‘an irreg- 
ular, ill-digested piece,’’ and is inclined to 

feel that the student should always write with a 
clearly defined purpose in view. He should 
know whether his composition is to be written 
in order to inform, or to amuse, or to inspire, 
or to perform some other function, and his 
composition should be evaluated by the audience 
consisting usually of classmates who decide 
whether it did actually inform them, or inspire 
them, or amuse them. The present writer be- 
lieves that this method of appraising student 
composition has some merit since all writing 
is done with a particular purpose in view and 
evaluations are performed in terms of the suc- 
cess of the composition in achieving that partic- 
ular purpose. From this viewpoint, composi- 
tions should be rather short, for apart from the 
fact that most written compositions made to 
meet daily needs actually are short, it is much 
easier to make evaluations of short composi- 
tions than of long compositions. 


There is one aspect of written compositions 
where new methods of appraisal are needed. 
That is the appraisal of the aesthetic quality of 
written composition. In this connection, Dewey 
makes a useful distinction between an express- 
fon and a statement (5). Dewey distinguishes 
expressions which have aesthetic value only 
from statements which serve primarily to in- 
form. While the scientist makes statements, 
the artist makes expressions. The scientist 
makes statements which tell how to do certain 
things, but the expressions of the artist do not 
direct the observer what to do: they are made 
because they produce a certain experience in 
those who perceive them. John Dewey (p. 85) 
states, for example, that ‘‘the poetic as distinct 
from the prosaic, aesthetic art as distinct from 
the scientific, expression as distinct from state- 
ment, does something different from leading to 
an experience. It constitutes one.’’ Up to this 
point the discussion of the evaluation of written 
compositions has been limited to the evaluation 
of the student’s ability to make statements. 
However, techniques for evaluating the student’s 
ability to make aesthetic expressions must also 
be considered. 


The ability to produce an expression which 
has aesthetic value as distinct from a statement 
has rarely been measured with any accuracy 
and there is little in the technical literature on 
the measurement of this ability. However, 
there is one recent study of the ability of child- 
ren to make artistic expressions which presents 
results of great significance in the present con- 
nection (27). The author of this study starts out 
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with the significant observation, fairly well es- 
tablished by facts, that children often write po- 
etry more easily than adults and that the poetry 
they write is often of remarkably high quality. 
Although there is good evidence that consider- 
able fraction of all children can be creative in 
the language arts at a relatively early age, little 
has been done to measure this ability so that 
those who have it to an outstanding degree may 
be identified and followed through the years. 
The need for fostering this ability is evident 
from the tendency for it to decline during adol- 
escence when the poetic expressions of children 
lose much of their spontaneity, originality, and 
charm. In comparison with the preadolescent, 
the poetry of the older child is dull and prosaic 
and for the most part, by the time he reaches 
adulthood, his ‘“‘aesthetic possibilities have been 
atrophied beyond resurrection’’ (p. 34). 


A major difficulty in the appraisal of the free 
poetic compositions of children is that the poem 
of one child is so wholly different from the 
poem of another that the two productions are 
incommensurable. They must be evaluated on 
different scales because of differences in form, 
in content, and in other important variables. 

It becomes much easier to compare the poems 
of different children if the poems are written 
within the same frame of reference. An ingen- 
ious attempt to do this is described in Smith’s 
study. The evaluation technique was that of 
presenting the student with a part of a poem and 
asking him to add a few lines of original com- 
position. For example, in one of the questions 
the student was presented with the line, ‘‘I saw 
old autumn in the misty morn,’’ and was asked 
to add three lines to continue the opening. Here 
are examples of how two students completed 
this task: 


Example 1 


I saw old autumn in the misty morn 
Adorned in fading glories, like a king 

Grown old with age; the flaunted red of dawn, 
The gaudy leaves, concealed a dying thing. 


Example 2 


I saw old autumn in the misty morn 

With grey-eluded sky in the dreary dawn 

But the sun soon came up and the clouds 
went away, 

And eventually it was a lovely day. 


There can be little doubt that the second of these 
productions is greatly inferior to the first. It - 
seems to be fairly easy for teachers to evaluate 
the merit of a series of these compositions. 
Study has shown that, when two teachers make 
independent evaluations of the merit of the stu- 
dent’s responses in this test, there is rather 
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good agreement between the evaluations. The 
reliability of the appraisal process in this situ- 
ation is high and in marked contrast to the un- 
reliability of the process of grading poems pro- 
duced in situations where no restrictions are 
placed on the kind of creation which the student 
is expected to produce. 


Another of the remarkable things about this 
type of measuring technique is that the children 
are able to produce so much original material 
in so short a space of time. In the test descri- 
bed by Smith, the children were required to add 
two, three, or four lines to nineteen different 
selections. It is surprising to find that many 
children were able to do this entire task in less 
than 35 minutes and that the younger children 
in secondary schools had much less difficulty 
with the task than older children. The children 
used in the experimental work on the test would 
correspond rougbly in ability to those in an 
academic curriculum in an American secondary 
school. 


The possibilities inherent in the free answer 
type of measuring instrument have not been 
fully recognized by those concerned with educa- 
tional measurement. Unfortunately, the emphas- 
is on the development of objective best-answer 
type of tests has largely obscured from view the 
possibility of other forms of measurement. One 
result of this tendency has been a quite wide- 
spread attempt to use objective tests for meas- 
uring indirectly the ability of the student to 
write or to speak. It is argued on the basis of 
rather flimsy evidence that if, for example, a 
student can discriminate between a clearly 
worded statement and an ambiguous statement 
that he is more likely to write clear statements 
than if he cannot make the discrimination. Nu- 
merous objective tests of the mechanics of ex- 
pression have been developed for the purpose 
of measuring by very indirect means the abil- 
ity of the student to communicate ideas. Other 
objective tests have attempted to measure the 
ability to organize ideas (10). The develop- 
ment of these tests has been largely an un- 
healthy influence for two reasons. First, it 
has encouraged a very indirect method of meas- 
uring the outcomes of instruction. Second, 
many of these tests have not only provided a 
poor basis for evaluating the student’s comimun- 
ication skills but they have encouraged the per- 
petuation of many undesirable teaching prac- 
tices. Fries (12) has pointed out in this con- 
nection that the large number of language usage 
or grammar tests which have been produced 
are based on so-called common errors, but 
that a large fraction of these so-called common 
errors are accepted by scholars as satisfactory 
usage (6). ‘‘On the whole,’’ Fries states, ‘‘the 
methods of the present approach in the schools 
assume that the problem of language usage is 
a simple one of mistakes and correct language 
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which can easily be separated according to 
perfectly definite measures and therefore they 
attempt to make the pupil ‘conscious of the 
rules’ by which to determine the correct lang- 
uage. From the scientific point of view in 
ianguage, however, such methods are funda- 
mentaily unsound for language usage cannot 

be separated into two simple classes.’’ 


Probably the only major exception to this 
last statement is in the use of objective spell- 
ing tests where there is ample evidence to 
show that the student’s ability to spell can be 
measured much more adequately through the 
use of an objective test than through the per- 
usal of the student’s written composition. 


Before leaving the field of composition, some- 
thing must be said about the evaluation of oral 
compositions. Objectives related to oral com- 
position have been given increased emphasis 
at all levels in recent years because it has at 
last been recognized that in everyday life 
there is a greater need for oral composition 
than for written composition. However, the 
evaluation of oral compositions presents all the 
difficulties associated with the evaluation of 
the written composition and includes an addit- 
ional difficulty as well, namely, the fact that 
oral compositions are fleeting, transitory 
phenomena which cannot be reviewed repeat- 
edly for evaluation purposes unless they are 
recorded, Much can be done, however, to im- 
prove the rather rough evaluations of oral 
compositions which are commonly made. Im- 
provements can be made in evaluating oral 
compositions by standardizing the situation in 
which the oral composition is made and by 
attempting to evaluate the compositions by 
comparing them with a graded series of com- 
positions. At least one attempt to do this has 
been published in a reputable educational jour- 
nal (23). In this latter article, it was pointed 
out that just as the quality of a sample of hand- 
writing may be measured by the Thorndike 
Scale (31) by comparing it with a series of 
samples graded in quality, so too may the qual- 
ity of a student’s oral composition be measured 
by comparing it with a series of samples. The 
article goes on to describe scales for classi- 
fying grade school pupils’ verbal responses to 
a story, toa picture, or to an object. While 
this technique is still experimental, it does 
seem to be a profitable one to develop. How- 
ever, teachers who use this measuring tech- 
nique will still have to develop their own scales 
since the recorded scales used in the study 
cannot be purchased. In this connection, it is 
worth noting that various people (2, 14, 17) 
have developed a paper-and-pencil scale for 
measuring certain aspects of oral expression. 
The rating scales thus developed have been 
published in journals and are available for use. 








A the Student’s Understanding 
A PET Written and Spoken Tanguage 

Up to this point, discussion has been centered 
on the appraisal of oral and written composi- 
tions. Consideration must be given now to the 
appraisal of the development of the student’s 
understanding of the i¢e=s of others conveyed 
through written language or speech. Reading 
comprehension and oral comprehension are im- 
portant objectives which all teachers, regard- 
less of field, should attempt to achieve. Since 
emphasis in schools has been on the develop- 
ment of reading skills rather than on listening 
skills, most of the work on appraisal has been 
done in the area of reading rather than in the 
area of listening. As a matter of fact, there has 
been only one major educational research on the 
measurement of listening skills though there are 
thousands on the appraisal of reading skills (13). 
The development of listening skills remains 
very largely an unexplored field. 


Reading skills have been most adequately 
measured at the elementary school level where 
the main function of these measurements has 
been for diagnostic purposes--that is to say, 
for the purpose of determining the causes of 
reading difficulties. Special mechanical devices 
have also been devised for studying the mech- 
anics of the reading process, but Traxier, after 
a careful review of the results achieved with 
such devices, concludes that there was little 
evidence to justify their use. In any case, the 
various mechanical devices produced for study- 
ing the reading difficulties of children are not 
practical devices for the classroom teacher 
and even in the hands of the expert are of ques- 
tionable value. 


The measurement of reading skills at the 
secondary school and college level is much 
less satisfactory than at the elementary school 
level. This is evident from the fact that differ- 
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which a single reading comprehension test may 
attempt to measure, the following partially 
complete list of objectives is given: 


Recognizing the purpose of the author. 
Identifying the main ideas conveyed. 


Identifying the main techniques used by the 
writer in transmitting his ideas. 


Recognizing the author’s philosophy. 


Recognizing the meaning of certain figures 
of speech. 


Identifying the meaning of particular words. 
Identifying errors in reasoning. 
Identifying references made by the author. 


Identifying the main conclusion of the 
author. 


Identifying the meaning of particular words 
as they are used in context. 


Identifying implications of the author’s 
arguments. 


Understanding the nature of the characters 
portrayed by the author. 


Recognizing humor. 


Recognizing the use of alliteration and 
other devices used by the author. 


Probably the best developed tests of reading 
comprehension at the secondary school level 
are those developed by the Cooperative Test 
Service. The Cooperative Tests of Reading 
Comprehension cover a broad range of objec- 
tives such as those just discussed and have val- 
idity in the sense that they determine whether 
the student can do some of the important things 
in reading which teachers try to develop. They 
measure not only the ability of the student to 
understand written material but also, to a lim- 
ited extent, his appreciation of many of the lit- 
erary qualities of the material. These tests 
are not only readily obtainable but they are also 
reasonably priced and even if a teacher does 
not wish to use them, they are worth examining 
as a source of ideas on evaluation procedures. 


One of the interesting outcomes of the use of 


tests of reading comprehension is the finding 
that (13) measures of reading comprehension 


poor reader to be the poor listener. Until good 





tests of listening comprehension have been de- 
veloped, test of reading comprehension may be 
used with some validity for predicting skill in 
that area. The lack of adequate evaluation in- 
struements in measures of listening compre- 
hension reflects the fact that little is known 
concerning the extent to which listening skills 
can be trained. 


A special problem in the appraisal of reading 
comprehension is the appraisal of the student’s 
understanding and appreciation of poetry, and 
it is a field of measurement which has attrac- 
ted some of the most ingenious workers. 
Abbott and Trabue (1) who carried out a pion- 
eer investigation in this field considered that 
an important indication of the appreciation of 
poetry would lie in the individual’s ability to 
distinguish good poetry from poor poetry. It 
seems reasonable to suppose that a person who 
cannot distinguish between poetry judged by ex- 
perts to be good and poetry judged to be poor 
could not have any appreciation of poetry and 
such critical discernment is an important con- 
stituent of the appreciative process. 


Abbott and Trabue selected a large number of 
poems which are recognized as works of con- 
siderable merit. These poems were modified 
in ways that destroyed their original worth in 
varying - Some were converted into 
sentimental versions of the original, some into 
prosaic versions. Altogether three new vers- 
ions of each poem were prepared and then to- 
gether with the original were submitted to a 
mumber of experts including poets, literary ed- 
ttors, critics, and professors of literature. If 
these experts did not agree that the original 
was the best of the four versions then the ser- 
ies was discarded. In some cases, it seems a 
change in a poem actually improved it. 


After four versions of each of over a hundred 
poems had been so examined, two short series 
of thirteen poems each were selected for the 
two forms of the final test. Each of these ser- 
ies represented material of graded difficulty 
from Mother Goose up to Milton or Browning. 
In the test itself each page was devoted to the 
four versions of each of a single poem and the 
subject was instructed to ‘‘Read the poems, A, 
B, C, D, trying to think how they would sound if 
read aloud. Write ‘Best’ on the dotted line 
above the one you like best as poetry. Write 
‘Worst’ above the one you like least.’’ 


These tests were given to a large number of 
children in elementary and high schools and to 
students in college in order to determine how 
far young people of different ages were able to 
discriminate the merits of one version of a 
poem over those of another. It is obvious that 
the student whose taste corresponded most 
closely with that of the experts would be able to 
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choose correctly the best version of each of the 
thirteen series and would thus score thirteen 
points. On the other hand, the person who just 
guessed would score on the average between 
three and four points. At all levels it was 
found that an appreciable proportion of the sub- 
jects tested did no better than they would be ex- 
pected to do by chance. For example, amongst 
fifty-six children in the fifth grade twenty-nine 
scored less than they might be expected to 
score by chance, and the fine qualities of the 
best version of these poems won recognition 
from only a relatively small fraction of the 
college group. In the higher grades of the el- 
ementary school there was practically no appre- 
ciation of the merits of the various versions of 
the poem, and it was difficult to see how child- 
ren could possibly appreciate such poetry when 
they are unable to distinguish the good versions 
from those that were both atrocious and idiotic. 
The situation in the high school was not much 
better and children still showed little taste, and 
even in college a large number of students were 
unable to distinguish good and poor versions of 
relatively simple poems. 


Not only did students of certain ages fail to 
choose the best versions but many of them 
showed a definite preference for the poor ver- 
sion and particularly those poor versions in 
which the original poem had been sentimental- 
ized. There was a steady increase in the pref- 
erence shown for the sentimental versions 
throughout high school. This evaluation tech- 
nique indicates that the poetic material which 
children in high school were expected to appre- 
ciate had no relationship at all to the interests 
of their age. It seems almost self-evident that 
children cannot develop an appreciation for 
material which they do not understand, which 
has for them no real interest, and which they 
describe as dull and unintelligible. Much read- 
ing of poetry is reminiscent of the Kentucky 
mountain children who were reading Canterbury 
Tales. Before leaving this technique, it may be 
noted that there has recently been increased in- 
terest in the study of children’s preferences of 
form and style (26,36). 


However, the failure of children to appreciate 
and enjoy much of the poetry with which they 
come into contact does not seem to be a result 
of the nature of poetry but a consequence of the 
type of material that is selected for school 
children. This was indicated rather clearly in 
an evaluation study by Rose Manicoff (22) which 
showed that the intensive study of suitable po- 
etry in the seventh, eighth and ninth grades re- 
sulted in an increased interest in poetry and an 
increase in the amount of poetry read voluntar- 
fly. 


The children who had poetry read to them 
regularly also showed a greater increase in 








their scores on a test of poetry appreciation 
than the children who did not have this exper- 
fence. 


In the Manicoff study, the extensive reading of 
poetry also seemed to have some effect — the 
amount of creative writing undertaken by 
dents who had never vate pores es “ing 
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written or vers ice some 
during the experiment. In a cont there 
was almost no change in creative writing. 


Most studies of the development of the appre- 
ciation of poetry have measured appreciation in 
terms of the ability of the —- to distinguish 
verse that is good rts from verse 
that is judged as poor. Al the making of 
such discriminations is an important rbeleos 
ent of appreciative experiences, nevertheless 
it seems to the present writer that it is not 
only aspect of such experience that is worthy y" 
appraisal. Such judgments of worth could no 
doubt be made with accuracy without being 
accompanied by any experience of enjoyment. 
coat tae seems to be one of the most im- 

ctors in the process of appreciation. 
in this connection it is interesting to note that 
the study by Manicoff provided some evidence 
that an increase in the ability to ae con post- 
ry was connected with an increas 
ment that was derived from poetry. 


Newer Approaches to Evaluation 


The evaluation movement (4), in contrast to 
the objective examination movement has emph- 
asized that in most situations, objective tests 
represent indirect methods of measurement and 
should be used only when direct methods of 
measurement cannot be applied. A = 

in the evaluation during the 

last fifteen years has been a tendency to 
appraise student Ae = not so much in 
terms of paper-and-pencil classroom tests, but 
in terms of vior outside of the classroom. 
The argument has been that if education is to be 
considered as preparation for life as well as a 
sample of life itself, then the success of an ed- 
—_ project must be evaluated in terms of 

the subsequent behavior of the students outside 
the school. For example, if one of the objec- 
tives of a course is the development 
study habits, then it is much better — 
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tional value of objective tests would be much 
improved if were generally regarded as 
substitutes for er and more direct methods 
of appraising pupil development. 


An excellent example of this modern a ch 
to educational ems is found in the 
Year Study (26) of the Progressive Education 
Association. In this study an attempt was made 
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to appraise not only those outcomes of English 
instruction which can be measured by paper- 
and-pencil tests but also the other important 
outcomes which most teachers recognize but 
which most do not measure. For example, an 
attempt was made to appraise the extent to 
which instruction in English had developed de- 
sirable interests in reading. An investigation 
was made to determine whether the rea of 
students was abundant, varied as to type of con- 
tent, appropriate to the needs of the reader, and 
er matere in terms of difficulty, com- 
plexity, and depth of insight. A careful record 
was kept of the books which each pupil read 
voluntarily and the extent to which the pupil en- 
joyed each book. The maturity of the student’s 
reading interest was measured by means of a 
scale based on one developed by Jeanette H. 
Foster (11). The Foster Scale classified 250 
authors into six different levels of maturity 
according to the characteristics of the readers 
who enjoy them most. By the use of this scale 
it was possible to measure the maturity of each 
student’s reading interests. In the Eight Year 
Study the Foster Reading Scale was extended to 
cover 1,000 authors. 


In the Eight Year Study an attempt was also 
made to evaluate the reading of magazines 
undertaken by pupils. Magazine reading was 
evaluated only once or twice a year. The eval- 
uation procedure was to ge the pupils a check 
list which included the 100 magazines covering 
about 94 percent of the magazine rea done 
by high school pupils (9). In the check list they 
were asked to indicate how often they read each 
magazine, how thoroughly they read it, and 
where they obtained it. In addition they were 
asked to state which magazines they preferred 
to read. The maturity level of each magazine 
was determined. 


In addition an attempt was made to appraise 
the student’s reading of newspapers by determ- 
ining what papers were read, the amount of 
time devoted to newspaper reading and the sec- 
tions of the paper which were read most regu- 
larly. An incidental and interesting outcome of 
this appraisal was that very few students could 
identify the political policy of the newspaper 
they read. 


The Eight Year Study is also of interest to 
English teachers because it tended to develop 
new methods of appraising the student’s appre- 
ciation of literature. In this field as in other 
fields the Evaluation Committee attempted to 
obtain evidence of appreciation or of lack of 
appreciation wherever it could be found. Form- 
al examinations represented only a single and 
minor method of collecting evidence of student 
development in this field. The evaluation staff 
of the Eight Year Study attempted to make a 
careful analysis of what is meant by literary 
appreciation at the secondary school level and 
developed evaluation instruments on that basis. 


The main method used to appraise pupil appre- 
ciation of literature was to develop a question- 
naire which was used for determining students’ 
reactions to their reading. The questionnaire 
on voluntary reading attempted to measure 
seven major aspects of the material which the 
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student had read on a voluntary basis in his 
leisure hours. These seven aspects were the 
satisfaction derived from reading, the desire to 
read more of the same kind of material, the de- 
sire to know more about the things read, the 
desire to make creative expressions, the desire 
of the student to identify himself with the situa- 
tion describe in the reading material, the desire 
to think more about what has been read, and the 
desire - make fair judgments about the reading 
material. 


This approach to the appraisal of the outcome 
of instruction in English represents a healthy 
departure from the traditional type of essay ex- 
amination in which the pupil is asked to discuss 
some aspect of his reading. It has been widely 
used both at the high school, general college, 
and college level (7, 8). However, this new 
approach is possible only in the kind of situation 
in which the student and teacher are working 
closely together and in which the student knows 
that appraisals of his work are made, not for 
the purpose of assigning a mark, but for the - 
fece of helping him to obtain all he can out 

e. 


A similar cece was taken by Dora Smith 
in her study of the outcomes of English instruc- 
tion in elementary schools in the State of New 
York (25) in which she investigated the out-of- 
school reading of the pupils. Her findings indi- 
cate the kinds of things she tried to measure by 
means of questionnaires. These findings are 
4 summarized in the following paragraph 

p. 37): 


‘‘Pupils in the seven towns show a general 
lack of knowledge of good current books for 
children. Evidence indicates that, as pupils 
progress through the elementary grades, the 
schools are not conscious of the competition 
of inferior books suited to children’s inter- 
ests, which are read in increasingly greater 
numbers than are good wholesome ks for 
children.....The books read by boys are in- 
ferior to those read by girls.’’ ‘‘....only one 
pupil in sixteen read a current-events maga- 
zine during the three weeks of the study.’’ 


Similarly, questionnaire techni at the sec- 
ondary school level (24) showed that fewer than 
one magazine in ten was read because of teach- 
er assignment and that the schools needed to do 
—_ in guiding pupils in the reading of maga- 
zines. 


Summary 


An overview of evaluation procedures in the 
napeee arts reveais that these procedures are 
still in their infancy. Twenty years ago the ob- 
jective best-answer type of examination was held 
up to be the ideal instrument for measuring out- 
comes in this area, but experience has shown 
that the typical paper-and-pencil objective ex- 
amination has very limited value in measuring 
outcomes of teaching in on} There is a real 
need for the development of new methods for 
appraising the oral and written compositions of 
students, and some of the techniques of apprais- 
al described offer promise. Teachers would do 
well to experiment with tests similar to the po- 
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on omen test and short essay tests based 
same principle. 


ral, procedures for measuring reading 
at 1 have been developed to a more 
advanced stage than most other measuring de- 
vices in the general area. However, there is a 
real need for the development of tests of listen- 
- Ly. hension since they should throw con- 
8 ble light on many learning difficulties en- 





countered in schools. 


The most promising developments of all those 
reviewed are the devices for determining the 
out-of-school reading interests and activities of 
children. The present writer believes that 
these direct measurements of the outcomes of 
instruction offer much more promise than the 
—~ objective best-answer type of ex- 
am " 
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